MongoDB Atlas Monitoring

Track performance and health of your database

📈 What is Atlas Monitoring?

Atlas Monitoring provides real-time insights into database performance, resource usage, and health metrics. Track queries, connections, memory, and disk usage to optimize performance and prevent issues.


// Example: Monitoring slow queries
{
  "query": { "status": "active" },
  "executionTime": 5200,  // milliseconds
  "timestamp": "2024-01-15T10:30:00Z"
}
                                    

Alert:

⚠️ Slow query detected: 5.2 seconds

Key Metrics

Operations

Track queries, inserts, updates, deletes

{
  "reads": 1250,
  "writes": 340,
  "commands": 89
}
🔌

Connections

Monitor active and available connections

{
  "current": 45,
  "available": 455,
  "total": 500
}
💾

Memory

Track RAM usage and cache efficiency

{
  "resident": "2.5 GB",
  "virtual": "4.1 GB",
  "mapped": "3.2 GB"
}
💿

Disk

Monitor storage usage and IOPS

{
  "used": "45 GB",
  "total": "100 GB",
  "iops": 1200
}

🔹 Real-Time Metrics Dashboard

The Atlas dashboard displays live performance metrics updated every minute. Monitor operations per second, network traffic, query execution times, and resource utilization to identify bottlenecks and optimize performance.

// Sample metrics data structure
{
  "timestamp": "2024-01-15T10:30:00Z",
  "cluster": "MyCluster",
  "metrics": {
    "operations": {
      "query": 450,
      "insert": 120,
      "update": 80,
      "delete": 15,
      "getmore": 200,
      "command": 95
    },
    "connections": {
      "current": 45,
      "available": 455
    },
    "network": {
      "bytesIn": 2500000,    // bytes per second
      "bytesOut": 5800000,
      "numRequests": 560
    },
    "memory": {
      "resident": 2684354560,  // bytes
      "virtual": 4294967296,
      "mapped": 3435973836
    },
    "disk": {
      "usedBytes": 48318382080,
      "totalBytes": 107374182400,
      "iops": 1200
    }
  }
}

Dashboard View:

Operations/sec: 960 | Connections: 45/500

Memory: 2.5 GB | Disk: 45/100 GB

🔹 Setting Up Alerts

Configure alerts to notify you when metrics exceed thresholds. Receive notifications via email, SMS, Slack, or PagerDuty when issues occur, enabling proactive database management and quick problem resolution.

// Alert configuration example
{
  "alertConfigId": "alert-001",
  "eventTypeName": "OUTSIDE_METRIC_THRESHOLD",
  "enabled": true,
  "notifications": [
    {
      "typeName": "EMAIL",
      "emailAddress": "[email protected]",
      "delayMin": 0
    },
    {
      "typeName": "SLACK",
      "channelName": "#database-alerts"
    }
  ],
  "matchers": [
    {
      "fieldName": "CLUSTER_NAME",
      "operator": "EQUALS",
      "value": "MyCluster"
    }
  ],
  "metricThreshold": {
    "metricName": "CONNECTIONS_PERCENT",
    "operator": "GREATER_THAN",
    "threshold": 80,
    "units": "RAW",
    "mode": "AVERAGE"
  }
}

Common Alert Types:

  • High CPU: CPU usage > 80%
  • Low Disk Space: Disk usage > 90%
  • Connection Limit: Connections > 80% of max
  • Slow Queries: Query time > 5 seconds
  • Replication Lag: Lag > 10 seconds

🔹 Query Performance Insights

The Performance Advisor analyzes slow queries and suggests optimizations. It identifies missing indexes, inefficient query patterns, and provides actionable recommendations to improve database performance.

// Slow query example
{
  "namespace": "mydb.users",
  "query": {
    "status": "active",
    "age": { "$gt": 25 }
  },
  "executionStats": {
    "executionTimeMillis": 5200,
    "totalDocsExamined": 1000000,
    "totalKeysExamined": 0,
    "nReturned": 450
  },
  "recommendation": {
    "type": "CREATE_INDEX",
    "index": { "status": 1, "age": 1 },
    "impact": "HIGH",
    "estimatedImprovement": "95% faster"
  }
}

// Create recommended index
db.users.createIndex({ status: 1, age: 1 });

// After optimization
{
  "executionTimeMillis": 45,  // Much faster!
  "totalDocsExamined": 450,
  "totalKeysExamined": 450
}

Performance Improvement:

✓ Query time reduced from 5.2s to 0.045s (99% faster)

🔹 Profiler and Logs

The database profiler captures detailed query information for analysis. View slow queries, execution plans, and operation details to troubleshoot performance issues and optimize database operations.

// Enable profiler (level 1 = slow queries only)
db.setProfilingLevel(1, { slowms: 100 });

// View profiler data
db.system.profile.find().limit(5).pretty();

// Sample profiler output
{
  "op": "query",
  "ns": "mydb.orders",
  "command": {
    "find": "orders",
    "filter": { "status": "pending" }
  },
  "keysExamined": 0,
  "docsExamined": 50000,
  "numYield": 391,
  "responseLength": 12500,
  "millis": 245,
  "ts": ISODate("2024-01-15T10:30:00Z"),
  "client": "192.168.1.100",
  "user": "appUser"
}

// Disable profiler
db.setProfilingLevel(0);

🔹 Monitoring with API

Access monitoring data programmatically using the Atlas API:

// Fetch metrics via Atlas API
const response = await fetch(
  'https://cloud.mongodb.com/api/atlas/v1.0/groups/{groupId}/processes/{host}/measurements',
  {
    method: 'GET',
    headers: {
      'Authorization': 'Digest username="API_KEY", realm="MMS Public API"'
    },
    params: {
      granularity: 'PT1M',  // 1 minute intervals
      period: 'PT1H',       // Last 1 hour
      m: 'CONNECTIONS,OPCOUNTER_QUERY,SYSTEM_CPU_USER'
    }
  }
);

const metrics = await response.json();
console.log(metrics.measurements);

🔹 Best Practices

Follow these monitoring guidelines:

  • Set up alerts: Configure alerts for critical metrics
  • Monitor trends: Watch for gradual performance degradation
  • Use indexes: Follow Performance Advisor recommendations
  • Review logs: Check logs regularly for errors and warnings
  • Plan capacity: Monitor growth to plan scaling
  • Test queries: Use explain() to analyze query performance

🧠 Test Your Knowledge

What should you do when you receive a high CPU usage alert?