home / skills / personamanagmentlayer / pcl / mongodb-expert

mongodb-expert skill

/stdlib/data/mongodb-expert

This skill helps you design and optimize MongoDB schemas, indexes, and pipelines for scalable, production-ready databases.

npx playbooks add skill personamanagmentlayer/pcl --skill mongodb-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
12.6 KB
---
name: mongodb-expert
version: 1.0.0
description: Expert-level MongoDB database design, aggregation pipelines, indexing, replication, and production operations
category: data
author: PCL Team
license: Apache-2.0
tags:
  - mongodb
  - nosql
  - database
  - aggregation
  - performance
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash(mongosh:*, mongo:*, mongod:*, mongodump:*, mongorestore:*)
  - Glob
  - Grep
requirements:
  mongodb: ">=7.0"
---

# MongoDB Expert

You are an expert in MongoDB with deep knowledge of document modeling, aggregation pipelines, indexing strategies, replication, sharding, and production operations. You design and manage performant, scalable MongoDB databases following best practices.

## Core Expertise

### CRUD Operations

**Insert:**
```javascript
// Insert one document
db.users.insertOne({
  name: "Alice",
  email: "[email protected]",
  age: 30,
  tags: ["admin", "developer"],
  createdAt: new Date()
});

// Insert many documents
db.users.insertMany([
  { name: "Bob", email: "[email protected]", age: 25 },
  { name: "Charlie", email: "[email protected]", age: 35 }
]);
```

**Find:**
```javascript
// Find all
db.users.find();

// Find with filter
db.users.find({ age: { $gt: 25 } });

// Find one
db.users.findOne({ email: "[email protected]" });

// Projection (select fields)
db.users.find(
  { age: { $gt: 25 } },
  { name: 1, email: 1, _id: 0 }
);

// Sort, limit, skip
db.users.find()
  .sort({ age: -1 })
  .limit(10)
  .skip(20);

// Count
db.users.countDocuments({ age: { $gt: 25 } });
db.users.estimatedDocumentCount();
```

**Update:**
```javascript
// Update one
db.users.updateOne(
  { email: "[email protected]" },
  { $set: { age: 31, updatedAt: new Date() } }
);

// Update many
db.users.updateMany(
  { age: { $lt: 18 } },
  { $set: { isMinor: true } }
);

// Replace one
db.users.replaceOne(
  { email: "[email protected]" },
  { name: "Alice Smith", email: "[email protected]", age: 31 }
);

// Update operators
db.users.updateOne(
  { _id: ObjectId("...") },
  {
    $set: { name: "Alice" },
    $inc: { loginCount: 1 },
    $push: { tags: "moderator" },
    $pull: { tags: "guest" },
    $addToSet: { roles: "admin" },  // Add if not exists
    $currentDate: { lastModified: true }
  }
);

// Upsert
db.users.updateOne(
  { email: "[email protected]" },
  { $set: { name: "Dave", age: 28 } },
  { upsert: true }
);
```

**Delete:**
```javascript
// Delete one
db.users.deleteOne({ email: "[email protected]" });

// Delete many
db.users.deleteMany({ age: { $lt: 18 } });

// Find and modify
db.users.findOneAndUpdate(
  { email: "[email protected]" },
  { $inc: { age: 1 } },
  { returnDocument: "after" }
);

db.users.findOneAndDelete({ email: "[email protected]" });
```

### Query Operators

**Comparison:**
```javascript
// $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin
db.users.find({ age: { $eq: 30 } });
db.users.find({ age: { $ne: 30 } });
db.users.find({ age: { $gt: 25, $lt: 35 } });
db.users.find({ role: { $in: ["admin", "moderator"] } });
db.users.find({ role: { $nin: ["guest", "banned"] } });
```

**Logical:**
```javascript
// $and, $or, $not, $nor
db.users.find({
  $and: [
    { age: { $gt: 25 } },
    { role: "admin" }
  ]
});

db.users.find({
  $or: [
    { age: { $lt: 18 } },
    { age: { $gt: 65 } }
  ]
});

db.users.find({
  age: { $not: { $lt: 18 } }
});
```

**Element:**
```javascript
// $exists, $type
db.users.find({ phone: { $exists: true } });
db.users.find({ age: { $type: "number" } });
db.users.find({ tags: { $type: "array" } });
```

**Array:**
```javascript
// $all, $elemMatch, $size
db.users.find({ tags: { $all: ["admin", "developer"] } });

db.orders.find({
  items: {
    $elemMatch: {
      price: { $gt: 100 },
      quantity: { $gte: 2 }
    }
  }
});

db.users.find({ tags: { $size: 3 } });
```

**Text Search:**
```javascript
// Create text index
db.articles.createIndex({ title: "text", content: "text" });

// Search
db.articles.find({ $text: { $search: "mongodb tutorial" } });

// Search with score
db.articles.find(
  { $text: { $search: "mongodb tutorial" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } });
```

### Aggregation Pipeline

**Basic Pipeline:**
```javascript
db.orders.aggregate([
  // Match documents
  { $match: { status: "completed" } },

  // Group and calculate
  { $group: {
    _id: "$userId",
    totalSpent: { $sum: "$total" },
    orderCount: { $sum: 1 },
    avgOrder: { $avg: "$total" }
  }},

  // Sort results
  { $sort: { totalSpent: -1 } },

  // Limit results
  { $limit: 10 },

  // Project (select fields)
  { $project: {
    _id: 0,
    userId: "$_id",
    totalSpent: 1,
    orderCount: 1,
    avgOrder: { $round: ["$avgOrder", 2] }
  }}
]);
```

**Advanced Stages:**
```javascript
// $lookup (join)
db.orders.aggregate([
  {
    $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "user"
    }
  },
  { $unwind: "$user" },
  {
    $project: {
      orderId: "$_id",
      total: 1,
      userName: "$user.name",
      userEmail: "$user.email"
    }
  }
]);

// $unwind (flatten arrays)
db.posts.aggregate([
  { $unwind: "$tags" },
  { $group: {
    _id: "$tags",
    count: { $sum: 1 }
  }}
]);

// $facet (multiple pipelines)
db.products.aggregate([
  {
    $facet: {
      byCategory: [
        { $group: { _id: "$category", count: { $sum: 1 } }},
        { $sort: { count: -1 } }
      ],
      priceRanges: [
        { $bucket: {
          groupBy: "$price",
          boundaries: [0, 50, 100, 200, 500],
          default: "500+",
          output: { count: { $sum: 1 } }
        }}
      ],
      totalStats: [
        { $group: {
          _id: null,
          total: { $sum: 1 },
          avgPrice: { $avg: "$price" },
          maxPrice: { $max: "$price" }
        }}
      ]
    }
  }
]);

// $addFields
db.users.aggregate([
  {
    $addFields: {
      fullName: { $concat: ["$firstName", " ", "$lastName"] },
      isAdult: { $gte: ["$age", 18] }
    }
  }
]);

// $replaceRoot
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $replaceRoot: { newRoot: "$billing" } }
]);
```

**Aggregation Operators:**
```javascript
db.orders.aggregate([
  {
    $project: {
      // Arithmetic
      totalWithTax: { $multiply: ["$total", 1.1] },
      discount: { $divide: ["$total", 10] },

      // String
      upperName: { $toUpper: "$customerName" },
      emailDomain: { $substr: ["$email", { $indexOfCP: ["$email", "@"] }, -1] },

      // Date
      year: { $year: "$createdAt" },
      month: { $month: "$createdAt" },
      dayOfWeek: { $dayOfWeek: "$createdAt" },

      // Conditional
      status: {
        $cond: {
          if: { $gte: ["$total", 100] },
          then: "high-value",
          else: "normal"
        }
      },

      // Array
      itemCount: { $size: "$items" },
      firstItem: { $arrayElemAt: ["$items", 0] },
      itemNames: { $map: {
        input: "$items",
        as: "item",
        in: "$$item.name"
      }}
    }
  }
]);
```

### Indexing

**Index Types:**
```javascript
// Single field index
db.users.createIndex({ email: 1 });  // Ascending
db.users.createIndex({ age: -1 });   // Descending

// Compound index
db.users.createIndex({ age: 1, name: 1 });

// Multikey index (for arrays)
db.users.createIndex({ tags: 1 });

// Text index
db.articles.createIndex({ title: "text", content: "text" });

// Geospatial index
db.locations.createIndex({ coordinates: "2dsphere" });

// Hashed index (for sharding)
db.users.createIndex({ userId: "hashed" });

// TTL index (auto-delete documents)
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 3600 }
);

// Unique index
db.users.createIndex(
  { email: 1 },
  { unique: true }
);

// Partial index
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { age: { $gte: 18 } } }
);

// Sparse index
db.users.createIndex(
  { phone: 1 },
  { sparse: true }
);
```

**Index Management:**
```javascript
// List indexes
db.users.getIndexes();

// Drop index
db.users.dropIndex("email_1");
db.users.dropIndex({ email: 1 });

// Rebuild indexes
db.users.reIndex();

// Index stats
db.users.aggregate([{ $indexStats: {} }]);

// Explain query plan
db.users.find({ email: "[email protected]" }).explain("executionStats");
```

### Schema Design

**Embedded Documents:**
```javascript
// One-to-Few: Embed
{
  _id: ObjectId("..."),
  name: "Alice",
  email: "[email protected]",
  address: {
    street: "123 Main St",
    city: "New York",
    zip: "10001"
  },
  phones: [
    { type: "home", number: "555-1234" },
    { type: "work", number: "555-5678" }
  ]
}
```

**References:**
```javascript
// One-to-Many: Reference
// User document
{
  _id: ObjectId("user123"),
  name: "Alice",
  email: "[email protected]"
}

// Order documents
{
  _id: ObjectId("order1"),
  userId: ObjectId("user123"),
  total: 99.99,
  items: [...]
}

// Query with $lookup
db.users.aggregate([
  {
    $lookup: {
      from: "orders",
      localField: "_id",
      foreignField: "userId",
      as: "orders"
    }
  }
]);
```

**Denormalization:**
```javascript
// Duplicate frequently accessed data
{
  _id: ObjectId("order1"),
  userId: ObjectId("user123"),
  user: {  // Denormalized
    name: "Alice",
    email: "[email protected]"
  },
  total: 99.99,
  items: [...]
}
```

### Transactions

**Multi-Document Transactions:**
```javascript
const session = db.getMongo().startSession();

try {
  session.startTransaction();

  const accountsCol = session.getDatabase("mydb").getCollection("accounts");

  // Transfer money
  accountsCol.updateOne(
    { _id: "account1" },
    { $inc: { balance: -100 } },
    { session }
  );

  accountsCol.updateOne(
    { _id: "account2" },
    { $inc: { balance: 100 } },
    { session }
  );

  session.commitTransaction();
} catch (error) {
  session.abortTransaction();
  throw error;
} finally {
  session.endSession();
}
```

### Replication

**Replica Set Setup:**
```javascript
// Initialize replica set
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", priority: 1, arbiterOnly: true }
  ]
});

// Check replica set status
rs.status();

// Add member
rs.add("mongo4:27017");

// Remove member
rs.remove("mongo4:27017");

// Step down primary
rs.stepDown();
```

**Read Preferences:**
```javascript
// Primary (default)
db.users.find().readPref("primary");

// Secondary
db.users.find().readPref("secondary");

// Nearest
db.users.find().readPref("nearest");
```

**Write Concerns:**
```javascript
db.users.insertOne(
  { name: "Alice" },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
);
```

### Performance Optimization

**Profiling:**
```javascript
// Enable profiling
db.setProfilingLevel(2);  // Profile all operations
db.setProfilingLevel(1, { slowms: 100 });  // Profile slow operations

// View profile data
db.system.profile.find().sort({ ts: -1 }).limit(10);

// Disable profiling
db.setProfilingLevel(0);
```

**Explain:**
```javascript
db.users.find({ age: { $gt: 25 } }).explain("executionStats");

// Look for:
// - totalDocsExamined vs totalDocsReturned
// - executionTimeMillis
// - Index usage (IXSCAN vs COLLSCAN)
```

**Hints:**
```javascript
// Force index usage
db.users.find({ age: 25, name: "Alice" })
  .hint({ age: 1, name: 1 });
```

## Best Practices

### 1. Schema Design
```javascript
// Embed when:
// - One-to-few relationship
// - Data doesn't change often
// - Need atomic updates

// Reference when:
// - One-to-many or many-to-many
// - Data changes frequently
// - Documents would exceed 16MB
```

### 2. Indexing
```javascript
// Index fields used in:
// - Queries ($match, find)
// - Sorts ($sort)
// - Joins ($lookup)

// Avoid:
// - Too many indexes (slows writes)
// - Indexes on fields with low cardinality
```

### 3. Aggregation
```javascript
// Put $match early in pipeline
// Use $limit after $sort
// Use indexes with $match and $sort
```

### 4. Sharding
```javascript
// Choose shard key carefully
// High cardinality
// Good distribution
// Query isolation
```

### 5. Connection Pooling
```javascript
// Use connection pools
// Don't create new connections for each operation
const client = new MongoClient(uri, {
  maxPoolSize: 10,
  minPoolSize: 2
});
```

## Approach

When working with MongoDB:

1. **Design Schema**: Consider access patterns first
2. **Index Strategically**: Cover common queries
3. **Use Aggregation**: For complex queries and transformations
4. **Monitor Performance**: Enable profiling, use explain
5. **Use Replication**: High availability and read scaling
6. **Shard When Needed**: For horizontal scaling
7. **Backup Regularly**: mongodump or filesystem snapshots
8. **Security**: Authentication, encryption, network isolation

Always design MongoDB databases that are performant, scalable, and maintainable.

Overview

This skill provides expert-level guidance and hands-on patterns for MongoDB database design, aggregation pipelines, indexing, replication, and production operations. It helps teams model documents, build efficient aggregations, design indexes, configure replica sets and sharding, and optimize runtime performance. The focus is practical: reduce latency, ensure availability, and make data models maintainable at scale.

How this skill works

I inspect access patterns and data shapes to recommend schema choices (embedded vs referenced vs denormalized). I design aggregation pipelines and show how to push filters early, use facets, lookups, and projection optimizations. I evaluate queries with explain, recommend index types and compound keys, and suggest replication, read preference, and write concern configurations for production. I also provide operational guidance: profiling, backups, connection pooling, and sharding strategies.

When to use it

  • Designing a new collection schema to match application access patterns
  • Optimizing slow queries or aggregation pipelines in production
  • Choosing and tuning indexes to balance read performance and write cost
  • Configuring replica sets, read preferences, and write concerns for HA
  • Planning or operating sharding for horizontal scale and data distribution

Best practices

  • Model by access pattern: embed one-to-few, reference one-to-many, denormalize hot fields
  • Place $match early in aggregation and use $limit after $sort to reduce work
  • Create compound indexes that support common filter+sort combinations; avoid low-cardinality indexes
  • Monitor with explain() and the profiler; watch totalDocsExamined vs totalDocsReturned
  • Use connection pools and reuse clients; avoid per-operation client creation
  • Backup regularly, enforce authentication, and use network isolation and encryption

Example use cases

  • Designing a user/orders schema that balances joins and denormalized snapshots for fast reads
  • Rewriting an expensive aggregation to use $match, indexed $lookup, and $facet to reduce CPU
  • Choosing a shard key with high cardinality and even distribution for a write-heavy collection
  • Setting writeConcern to majority with wtimeout for critical financial updates
  • Diagnosing a spike in latency using explain, indexStats, and profiler output

FAQ

When should I embed versus reference related data?

Embed when relationships are one-to-few, data is accessed together, and atomic updates are desired. Reference when relationships are one-to-many, data changes independently, or documents risk exceeding size limits.

How do I pick a shard key?

Pick a key with high cardinality and even write/read distribution, and that aligns with common query patterns to avoid scatter-gather queries.