home / skills / manutej / luxor-claude-marketplace / database-management-patterns
database-management-patterns skill

needs review
/plugins/luxor-database-pro/skills/database-management-patterns
This skill helps you design, optimize, and scale PostgreSQL and MongoDB databases with patterns for schemas, indexing, transactions, replication, and
npx playbooks add skill manutej/luxor-claude-marketplace --skill database-management-patterns
Review the files below or copy the command above to add this skill to your agents.
Files (3)
SKILL.md
48.9 KB
---
name: database-management-patterns
description: Comprehensive guide for database management patterns covering PostgreSQL and MongoDB including schema design, indexing, transactions, replication, and performance tuning
tags: [database, postgresql, mongodb, sql, nosql, indexing, transactions, replication, sharding, performance, schema-design]
tier: tier-1
---

# Database Management Patterns

A comprehensive skill for mastering database management across SQL (PostgreSQL) and NoSQL (MongoDB) systems. This skill covers schema design, indexing strategies, transaction management, replication, sharding, and performance optimization for production-grade applications.

## When to Use This Skill

Use this skill when:

- **Designing database schemas** for new applications or refactoring existing ones
- **Choosing between SQL and NoSQL** databases for your use case
- **Optimizing query performance** with proper indexing strategies
- **Implementing data consistency** with transactions and ACID guarantees
- **Scaling databases** horizontally with sharding and replication
- **Managing high-traffic applications** requiring distributed databases
- **Ensuring data integrity** with constraints, triggers, and validation
- **Troubleshooting performance issues** using explain plans and query analysis
- **Building fault-tolerant systems** with replication and failover strategies
- **Working with complex data relationships** (relational) or flexible schemas (document)

## Core Concepts

### Database Paradigms Comparison

#### Relational Databases (PostgreSQL)

**Strengths:**
- **ACID Transactions**: Strong consistency guarantees
- **Complex Queries**: JOIN operations, subqueries, CTEs
- **Data Integrity**: Foreign keys, constraints, triggers
- **Normalized Data**: Reduced redundancy, consistent updates
- **Mature Ecosystem**: Rich tooling, extensions, community

**Best For:**
- Financial systems requiring strict consistency
- Complex relationships and data integrity requirements
- Applications with structured, well-defined schemas
- Systems requiring complex analytical queries
- Multi-step transactions across multiple tables

#### Document Databases (MongoDB)

**Strengths:**
- **Flexible Schema**: Easy schema evolution, polymorphic data
- **Horizontal Scalability**: Built-in sharding support
- **JSON-Native**: Natural fit for modern application development
- **Embedded Documents**: Denormalized data for performance
- **Aggregation Framework**: Powerful data processing pipeline

**Best For:**
- Rapidly evolving applications with changing requirements
- Content management systems with varied data structures
- Real-time analytics and event logging
- Mobile and web applications with JSON APIs
- Hierarchical or nested data structures

### ACID Properties

**Atomicity**: All operations in a transaction succeed or fail together
**Consistency**: Transactions bring database from one valid state to another
**Isolation**: Concurrent transactions don't interfere with each other
**Durability**: Committed transactions survive system failures

### CAP Theorem

In distributed systems, choose two of three:
- **Consistency**: All nodes see the same data
- **Availability**: System remains operational
- **Partition Tolerance**: System continues despite network failures

PostgreSQL emphasizes CP (Consistency + Partition Tolerance)
MongoDB can be configured for CP or AP depending on write/read concerns

## PostgreSQL Patterns

### Schema Design Fundamentals

#### Normalization Levels

**First Normal Form (1NF)**
- Atomic values (no arrays or lists in columns)
- Each row is unique (primary key exists)
- No repeating groups

**Second Normal Form (2NF)**
- Meets 1NF requirements
- All non-key attributes depend on the entire primary key

**Third Normal Form (3NF)**
- Meets 2NF requirements
- No transitive dependencies (non-key attributes depend only on primary key)

**When to Denormalize:**
- Read-heavy workloads where joins are expensive
- Frequently accessed aggregate data
- Historical snapshots that shouldn't change
- Performance-critical queries

#### Table Design Patterns

**Primary Keys:**
```sql
-- Serial auto-increment (traditional)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- UUID for distributed systems
CREATE TABLE accounts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Composite primary key
CREATE TABLE order_items (
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);
```

**Foreign Key Constraints:**
```sql
-- Cascade delete: Remove child records when parent deleted
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- Set null: Preserve child records, nullify reference
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    user_id INTEGER,
    content TEXT NOT NULL,
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

-- Restrict: Prevent deletion if child records exist
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT
);
```

### Advanced Constraints

**Check Constraints:**
```sql
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0)
);

-- Table-level check constraint
CREATE TABLE date_ranges (
    id SERIAL PRIMARY KEY,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    CHECK (end_date > start_date)
);
```

**Unique Constraints:**
```sql
-- Single column unique
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

-- Composite unique constraint
CREATE TABLE user_permissions (
    user_id INTEGER NOT NULL,
    permission_id INTEGER NOT NULL,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, permission_id),
    FOREIGN KEY (user_id) REFERENCES users(id),
    FOREIGN KEY (permission_id) REFERENCES permissions(id)
);

-- Partial unique index (unique where condition met)
CREATE UNIQUE INDEX unique_active_email
ON users (email)
WHERE active = true;
```

### Triggers and Functions

**Audit Trail Pattern:**
```sql
-- Audit table
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    record_id INTEGER NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_data JSONB,
    new_data JSONB,
    changed_by VARCHAR(255),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Trigger function
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Attach trigger to table
CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();
```

**Timestamp Update Pattern:**
```sql
CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER posts_update_timestamp
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();
```

### Views and Materialized Views

**Standard Views:**
```sql
-- Virtual table - computed on each query
CREATE VIEW active_users_with_posts AS
SELECT
    u.id,
    u.username,
    u.email,
    COUNT(p.id) as post_count,
    MAX(p.created_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username, u.email;

-- Use view like a table
SELECT * FROM active_users_with_posts WHERE post_count > 10;
```

**Materialized Views:**
```sql
-- Physical table - stores computed results
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) as total_posts,
    COUNT(DISTINCT c.id) as total_comments,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_activity
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY u.id, u.username;

-- Create index on materialized view
CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- Refresh materialized view (update data)
REFRESH MATERIALIZED VIEW user_statistics;

-- Concurrent refresh (allows reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;
```

## MongoDB Patterns

### Document Modeling Strategies

#### Embedding vs Referencing

**Embedding Pattern (Denormalization):**
```javascript
// One-to-few: Embed when relationship is contained
// Example: Blog post with comments
{
    _id: ObjectId("..."),
    title: "Database Design Patterns",
    author: "John Doe",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comments: [
        {
            _id: ObjectId("..."),
            author: "Jane Smith",
            text: "Great article!",
            created_at: ISODate("2025-01-16")
        },
        {
            _id: ObjectId("..."),
            author: "Bob Johnson",
            text: "Very helpful, thanks!",
            created_at: ISODate("2025-01-17")
        }
    ],
    tags: ["database", "design", "patterns"],
    stats: {
        views: 1523,
        likes: 89,
        shares: 23
    }
}

// Benefits:
// - Single query to retrieve post with comments
// - Better read performance
// - Atomic updates to entire document
//
// Drawbacks:
// - Document size limits (16MB in MongoDB)
// - Difficult to query comments independently
// - May duplicate data if comments need to appear elsewhere
```

**Referencing Pattern (Normalization):**
```javascript
// One-to-many or many-to-many: Reference when relationship is unbounded
// Example: User with many posts

// Users collection
{
    _id: ObjectId("507f1f77bcf86cd799439011"),
    username: "john_doe",
    email: "[email protected]",
    profile: {
        bio: "Software engineer",
        avatar_url: "https://...",
        location: "San Francisco"
    },
    created_at: ISODate("2024-01-01")
}

// Posts collection (references user)
{
    _id: ObjectId("507f191e810c19729de860ea"),
    user_id: ObjectId("507f1f77bcf86cd799439011"),
    title: "My First Post",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comment_ids: [
        ObjectId("..."),
        ObjectId("...")
    ]
}

// Benefits:
// - No duplication of user data
// - Flexible: users can have unlimited posts
// - Easy to update user information once
//
// Drawbacks:
// - Requires multiple queries or $lookup
// - Slower read performance for joined data
```

**Hybrid Approach (Selective Denormalization):**
```javascript
// Store frequently accessed fields from referenced document
{
    _id: ObjectId("..."),
    title: "Database Patterns",
    content: "...",
    author: {
        // Embedded: frequently accessed, rarely changes
        id: ObjectId("507f1f77bcf86cd799439011"),
        username: "john_doe",
        avatar_url: "https://..."
    },
    // Reference: full user data available if needed
    author_id: ObjectId("507f1f77bcf86cd799439011"),
    published_at: ISODate("2025-01-15")
}

// Benefits:
// - Fast reads with embedded frequently-used data
// - Can still get full user data when needed
// - Balance between performance and flexibility
//
// Tradeoffs:
// - Need to update embedded data when user changes username/avatar
// - Slightly larger documents
```

### Schema Design Patterns

**Bucket Pattern (Time-Series Data):**
```javascript
// Instead of one document per measurement:
// BAD: Millions of tiny documents
{
    sensor_id: "sensor_001",
    timestamp: ISODate("2025-01-15T10:00:00Z"),
    temperature: 72.5,
    humidity: 45
}

// GOOD: Bucket documents with arrays of measurements
{
    sensor_id: "sensor_001",
    date: ISODate("2025-01-15"),
    hour: 10,
    measurements: [
        { minute: 0, temperature: 72.5, humidity: 45 },
        { minute: 1, temperature: 72.6, humidity: 45 },
        { minute: 2, temperature: 72.4, humidity: 46 },
        // ... up to 60 measurements per hour
    ],
    summary: {
        count: 60,
        avg_temperature: 72.5,
        min_temperature: 71.8,
        max_temperature: 73.2
    }
}

// Benefits:
// - Reduced document count (60x fewer documents)
// - Better index efficiency
// - Pre-computed summaries
// - Easier to query by time ranges
```

**Computed Pattern (Pre-Aggregated Data):**
```javascript
// Store computed values to avoid expensive aggregations
{
    _id: ObjectId("..."),
    product_id: "PROD-123",
    month: "2025-01",
    total_sales: 15420.50,
    units_sold: 234,
    unique_customers: 187,
    avg_order_value: 65.90,
    top_customers: [
        { customer_id: "CUST-456", revenue: 890.50 },
        { customer_id: "CUST-789", revenue: 675.25 }
    ],
    computed_at: ISODate("2025-02-01T00:00:00Z")
}

// Update pattern: Scheduled job or trigger updates computed values
```

**Polymorphic Pattern (Varied Schemas):**
```javascript
// Handle different product types in single collection
{
    _id: ObjectId("..."),
    type: "book",
    name: "Database Design",
    price: 49.99,
    // Book-specific fields
    isbn: "978-0-123456-78-9",
    author: "John Smith",
    pages: 456,
    publisher: "Tech Books Inc"
}

{
    _id: ObjectId("..."),
    type: "electronics",
    name: "Wireless Mouse",
    price: 29.99,
    // Electronics-specific fields
    brand: "TechBrand",
    warranty_months: 24,
    specifications: {
        battery_life: "6 months",
        connectivity: "Bluetooth 5.0"
    }
}

// Query by type
db.products.find({ type: "book", author: "John Smith" })
db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })
```

### Aggregation Framework

**Basic Aggregation Pipeline:**
```javascript
// Group by author and count posts
db.posts.aggregate([
    {
        $match: { published: true } // Filter stage
    },
    {
        $group: {
            _id: "$author_id",
            total_posts: { $sum: 1 },
            total_views: { $sum: "$views" },
            avg_views: { $avg: "$views" },
            latest_post: { $max: "$published_at" }
        }
    },
    {
        $sort: { total_posts: -1 } // Sort by post count
    },
    {
        $limit: 10 // Top 10 authors
    }
])
```

**Advanced Pipeline with Lookup (Join):**
```javascript
// Join posts with user data
db.posts.aggregate([
    {
        $match: {
            published_at: { $gte: ISODate("2025-01-01") }
        }
    },
    {
        $lookup: {
            from: "users",
            localField: "author_id",
            foreignField: "_id",
            as: "author"
        }
    },
    {
        $unwind: "$author" // Flatten author array
    },
    {
        $project: {
            title: 1,
            content: 1,
            views: 1,
            "author.username": 1,
            "author.email": 1,
            days_since_publish: {
                $divide: [
                    { $subtract: [new Date(), "$published_at"] },
                    1000 * 60 * 60 * 24
                ]
            }
        }
    },
    {
        $sort: { views: -1 }
    }
])
```

**Aggregation with Grouping and Reshaping:**
```javascript
// Complex aggregation: Sales analysis
db.orders.aggregate([
    {
        $match: {
            status: "completed",
            created_at: {
                $gte: ISODate("2025-01-01"),
                $lt: ISODate("2025-02-01")
            }
        }
    },
    {
        $unwind: "$items" // Flatten order items
    },
    {
        $group: {
            _id: {
                product_id: "$items.product_id",
                customer_region: "$customer.region"
            },
            total_quantity: { $sum: "$items.quantity" },
            total_revenue: { $sum: "$items.total_price" },
            order_count: { $sum: 1 },
            avg_order_value: { $avg: "$items.total_price" }
        }
    },
    {
        $group: {
            _id: "$_id.product_id",
            regions: {
                $push: {
                    region: "$_id.customer_region",
                    quantity: "$total_quantity",
                    revenue: "$total_revenue"
                }
            },
            total_quantity: { $sum: "$total_quantity" },
            total_revenue: { $sum: "$total_revenue" }
        }
    },
    {
        $sort: { total_revenue: -1 }
    }
])
```

## Indexing Strategies

### PostgreSQL Indexes

**B-tree Indexes (Default):**
```sql
-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_posts_author_published
ON posts(author_id, published_at);

-- Query can use index:
-- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at;
-- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- Query CANNOT fully use index:
-- SELECT * FROM posts WHERE published_at > '2025-01-01'; (only uses first column)
```

**Partial Indexes:**
```sql
-- Index only active users
CREATE INDEX idx_active_users
ON users(username)
WHERE active = true;

-- Index only recent orders
CREATE INDEX idx_recent_orders
ON orders(created_at, status)
WHERE created_at > '2024-01-01';

-- Benefits: Smaller index size, faster queries on filtered data
```

**Expression Indexes:**
```sql
-- Index on lowercase email for case-insensitive search
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- Query that uses this index:
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- Index on JSONB field extraction
CREATE INDEX idx_metadata_tags
ON products((metadata->>'category'));
```

**Full-Text Search Indexes:**
```sql
-- Add tsvector column for full-text search
ALTER TABLE articles
ADD COLUMN tsv_content tsvector;

-- Populate tsvector column
UPDATE articles
SET tsv_content = to_tsvector('english', title || ' ' || content);

-- Create GIN index for full-text search
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- Full-text search query
SELECT title, ts_rank(tsv_content, query) as rank
FROM articles, to_tsquery('english', 'database & design') query
WHERE tsv_content @@ query
ORDER BY rank DESC;

-- Trigger to auto-update tsvector
CREATE TRIGGER articles_tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);
```

**JSONB Indexes:**
```sql
-- GIN index for JSONB containment queries
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);

-- Queries that use this index:
SELECT * FROM products WHERE metadata @> '{"color": "blue"}';
SELECT * FROM products WHERE metadata ? 'size';

-- Index on specific JSONB path
CREATE INDEX idx_products_category
ON products((metadata->>'category'));
```

**Index Monitoring:**
```sql
-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check index usage
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan as times_used,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
```

### MongoDB Indexes

**Single Field Indexes:**
```javascript
// Create index on single field
db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending

// Unique index
db.users.createIndex({ username: 1 }, { unique: true })

// Sparse index (only index documents with the field)
db.users.createIndex({ phone_number: 1 }, { sparse: true })
```

**Compound Indexes:**
```javascript
// Index on multiple fields (order matters!)
db.posts.createIndex({ author_id: 1, published_at: -1 })

// Efficient queries:
// - { author_id: "123" }
// - { author_id: "123", published_at: { $gte: ... } }
// - { author_id: "123" } with sort by published_at

// Inefficient:
// - { published_at: { $gte: ... } } alone (doesn't use index efficiently)

// ESR Rule: Equality, Sort, Range
// Best compound index order:
// 1. Equality filters first
// 2. Sort fields second
// 3. Range filters last
db.orders.createIndex({
    status: 1,           // Equality
    created_at: -1,      // Sort
    total_amount: 1      // Range
})
```

**Multikey Indexes (Array Fields):**
```javascript
// Index on array field
db.posts.createIndex({ tags: 1 })

// Document with array
{
    _id: ObjectId("..."),
    title: "Database Design",
    tags: ["database", "mongodb", "schema"]
}

// Query that uses multikey index
db.posts.find({ tags: "mongodb" })
db.posts.find({ tags: { $in: ["database", "nosql"] } })

// Compound multikey index (max one array field)
db.posts.createIndex({ tags: 1, published_at: -1 }) // Valid
// db.posts.createIndex({ tags: 1, categories: 1 }) // Invalid if both are arrays
```

**Text Indexes:**
```javascript
// Create text index for full-text search
db.articles.createIndex({
    title: "text",
    content: "text"
})

// Text search query
db.articles.find({
    $text: { $search: "database design patterns" }
})

// Search with relevance score
db.articles.find(
    { $text: { $search: "database design" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

// Weighted text index (prioritize title over content)
db.articles.createIndex(
    { title: "text", content: "text" },
    { weights: { title: 10, content: 5 } }
)
```

**Geospatial Indexes:**
```javascript
// 2dsphere index for geographic queries
db.locations.createIndex({ coordinates: "2dsphere" })

// Document format
{
    name: "Coffee Shop",
    coordinates: {
        type: "Point",
        coordinates: [-122.4194, 37.7749] // [longitude, latitude]
    }
}

// Find locations near a point
db.locations.find({
    coordinates: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [-122.4194, 37.7749]
            },
            $maxDistance: 1000 // meters
        }
    }
})
```

**Index Properties:**
```javascript
// TTL Index (auto-delete documents after time)
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 } // 1 hour
)

// Partial Index (index subset of documents)
db.orders.createIndex(
    { status: 1, created_at: -1 },
    { partialFilterExpression: { status: { $eq: "pending" } } }
)

// Case-insensitive index
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
)

// Background index creation (doesn't block operations)
db.large_collection.createIndex(
    { field: 1 },
    { background: true }
)
```

**Index Analysis:**
```javascript
// Explain query execution
db.posts.find({ author_id: "123" }).explain("executionStats")

// Check index usage
db.posts.aggregate([
    { $indexStats: {} }
])

// List all indexes on collection
db.posts.getIndexes()

// Drop unused index
db.posts.dropIndex("index_name")
```

## Transactions

### PostgreSQL Transaction Management

**Basic Transactions:**
```sql
-- Explicit transaction
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;
-- or ROLLBACK; to cancel changes
```

**Savepoints (Partial Rollback):**
```sql
BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...');
-- Oops, error in audit log

ROLLBACK TO SAVEPOINT before_audit;
-- Inventory update preserved, audit insert rolled back

-- Fix and retry
INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;
```

**Isolation Levels:**
```sql
-- Read Uncommitted (not supported in PostgreSQL, defaults to Read Committed)
-- Read Committed (default) - sees only committed data
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read - sees snapshot at transaction start
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1; -- Returns balance 1000
-- Another transaction updates balance to 1500 and commits
SELECT * FROM accounts WHERE id = 1; -- Still returns 1000 (repeatable read)
COMMIT;

-- Serializable - strictest isolation, prevents all anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- If concurrent transactions would violate serializability, one aborts
COMMIT;
```

**Advisory Locks (Application-Level Locking):**
```sql
-- Exclusive lock on arbitrary number
SELECT pg_advisory_lock(12345);
-- ... perform critical operation ...
SELECT pg_advisory_unlock(12345);

-- Try lock (non-blocking)
SELECT pg_try_advisory_lock(12345); -- Returns true if acquired, false otherwise

-- Session-level advisory lock (auto-released on disconnect)
SELECT pg_advisory_lock(user_id);
```

**Row-Level Locking:**
```sql
-- SELECT FOR UPDATE - lock rows for update
BEGIN;

SELECT * FROM products
WHERE id = 123
FOR UPDATE; -- Locks this row

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - shared lock (allows other reads, blocks writes)
SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - skip locked rows instead of waiting
SELECT * FROM queue
WHERE processed = false
ORDER BY priority
LIMIT 10
FOR UPDATE SKIP LOCKED;
```

### MongoDB Transactions

**Multi-Document Transactions:**
```javascript
// Transactions require replica set or sharded cluster
const session = db.getMongo().startSession()
session.startTransaction()

try {
    const accountsCol = session.getDatabase("mydb").accounts

    // Debit account
    accountsCol.updateOne(
        { _id: "account1" },
        { $inc: { balance: -100 } },
        { session }
    )

    // Credit account
    accountsCol.updateOne(
        { _id: "account2" },
        { $inc: { balance: 100 } },
        { session }
    )

    // Commit transaction
    session.commitTransaction()
} catch (error) {
    // Abort on error
    session.abortTransaction()
    throw error
} finally {
    session.endSession()
}
```

**Read and Write Concerns:**
```javascript
// Write Concern: Acknowledgment level
db.orders.insertOne(
    { customer_id: "123", items: [...] },
    {
        writeConcern: {
            w: "majority",        // Wait for majority of replica set
            j: true,              // Wait for journal write
            wtimeout: 5000        // Timeout after 5 seconds
        }
    }
)

// Read Concern: Data consistency level
db.orders.find(
    { status: "pending" }
).readConcern("majority") // Only return data acknowledged by majority

// Read Preference: Which replica to read from
db.orders.find({ ... }).readPref("secondary") // Read from secondary replica
```

**Atomic Operations (Single Document):**
```javascript
// Single document updates are atomic by default
db.counters.updateOne(
    { _id: "page_views" },
    {
        $inc: { count: 1 },
        $set: { last_updated: new Date() }
    }
)

// Atomic array operations
db.posts.updateOne(
    { _id: ObjectId("...") },
    {
        $push: {
            comments: {
                $each: [{ author: "John", text: "Great!" }],
                $position: 0 // Insert at beginning
            }
        }
    }
)

// Find and modify (atomic read-modify-write)
db.queue.findOneAndUpdate(
    { status: "pending" },
    { $set: { status: "processing", processor_id: "worker-1" } },
    {
        sort: { priority: -1 },
        returnDocument: "after" // Return updated document
    }
)
```

## Replication

### PostgreSQL Replication

**Streaming Replication (Primary-Standby):**
```sql
-- Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
hot_standby = on

-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- pg_hba.conf on primary
host replication replicator standby_ip/32 md5

-- Standby server (recovery.conf or postgresql.auto.conf)
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
```

**Logical Replication (Selective Replication):**
```sql
-- On publisher (source)
CREATE PUBLICATION my_publication FOR TABLE users, posts;
-- or FOR ALL TABLES;

-- On subscriber (destination)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...'
PUBLICATION my_publication;

-- Monitor replication
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;
```

**Failover and Promotion:**
```sql
-- Promote standby to primary
pg_ctl promote -D /var/lib/postgresql/data

-- Check replication lag
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;
```

### MongoDB Replication

**Replica Set Configuration:**
```javascript
// Initialize replica set
rs.initiate({
    _id: "myReplicaSet",
    members: [
        { _id: 0, host: "mongodb1.example.com:27017", priority: 2 },
        { _id: 1, host: "mongodb2.example.com:27017", priority: 1 },
        { _id: 2, host: "mongodb3.example.com:27017", priority: 1 }
    ]
})

// Add member to existing replica set
rs.add("mongodb4.example.com:27017")

// Remove member
rs.remove("mongodb4.example.com:27017")

// Check replica set status
rs.status()

// Check replication lag
rs.printSecondaryReplicationInfo()
```

**Replica Set Roles:**
```javascript
// Priority 0 member (cannot become primary)
rs.add({
    host: "analytics.example.com:27017",
    priority: 0,
    hidden: true // Hidden from application drivers
})

// Arbiter (voting only, no data)
rs.addArb("arbiter.example.com:27017")

// Delayed member (disaster recovery)
rs.add({
    host: "delayed.example.com:27017",
    priority: 0,
    hidden: true,
    slaveDelay: 3600 // 1 hour behind
})
```

**Read Preference Configuration:**
```javascript
// Application connection with read preference
const client = new MongoClient(uri, {
    readPreference: "secondaryPreferred", // Try secondary, fallback to primary
    readConcernLevel: "majority"
})

// Read Preference Modes:
// - primary (default): Read from primary only
// - primaryPreferred: Primary if available, else secondary
// - secondary: Read from secondary only
// - secondaryPreferred: Secondary if available, else primary
// - nearest: Read from nearest member (lowest latency)
```

## Sharding

### MongoDB Sharding Architecture

**Shard Key Selection:**
```javascript
// Good shard key characteristics:
// 1. High cardinality (many distinct values)
// 2. Even distribution
// 3. Query isolation (queries target specific shards)

// Example: User-based application
sh.shardCollection("mydb.users", { user_id: "hashed" })

// Hashed shard key: Even distribution, random data location
sh.shardCollection("mydb.events", { event_id: "hashed" })

// Range-based shard key: Ordered data, good for range queries
sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

// Compound shard key
sh.shardCollection("mydb.orders", {
    customer_region: 1,  // Coarse grouping
    order_date: 1        // Fine grouping
})
```

**Sharding Setup:**
```javascript
// 1. Start config servers (replica set)
mongod --configsvr --replSet configRS --port 27019

// 2. Initialize config server replica set
rs.initiate({
    _id: "configRS",
    configsvr: true,
    members: [
        { _id: 0, host: "cfg1.example.com:27019" },
        { _id: 1, host: "cfg2.example.com:27019" },
        { _id: 2, host: "cfg3.example.com:27019" }
    ]
})

// 3. Start shard servers (each is a replica set)
mongod --shardsvr --replSet shard1RS --port 27018

// 4. Start mongos (query router)
mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

// 5. Add shards to cluster
sh.addShard("shard1RS/shard1-a.example.com:27018")
sh.addShard("shard2RS/shard2-a.example.com:27018")

// 6. Enable sharding on database
sh.enableSharding("mydb")

// 7. Shard collections
sh.shardCollection("mydb.users", { user_id: "hashed" })
```

**Query Targeting:**
```javascript
// Targeted query (includes shard key)
db.users.find({ user_id: "12345" })
// Routes to single shard

// Scatter-gather query (no shard key)
db.users.find({ email: "[email protected]" })
// Queries all shards, merges results

// Check query targeting
db.users.find({ user_id: "12345" }).explain()
// Look for "SINGLE_SHARD" vs "ALL_SHARDS"
```

**Zone Sharding (Geographic Distribution):**
```javascript
// Define zones for geographic sharding
sh.addShardToZone("shard1", "US")
sh.addShardToZone("shard2", "EU")

// Define zone ranges
sh.updateZoneKeyRange(
    "mydb.users",
    { region: "US", user_id: MinKey },
    { region: "US", user_id: MaxKey },
    "US"
)

sh.updateZoneKeyRange(
    "mydb.users",
    { region: "EU", user_id: MinKey },
    { region: "EU", user_id: MaxKey },
    "EU"
)

// Shard collection with zone-aware key
sh.shardCollection("mydb.users", { region: 1, user_id: 1 })
```

### PostgreSQL Horizontal Partitioning

**Declarative Partitioning:**
```sql
-- Range partitioning
CREATE TABLE logs (
    id BIGSERIAL,
    log_time TIMESTAMP NOT NULL,
    message TEXT,
    level VARCHAR(10)
) PARTITION BY RANGE (log_time);

-- Create partitions
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- List partitioning
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(255),
    region VARCHAR(50)
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- Hash partitioning
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    data JSONB
) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... events_2 and events_3
```

**Partition Pruning (Query Optimization):**
```sql
-- Query automatically uses only relevant partition
SELECT * FROM logs
WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20';
-- Only scans logs_2025_01 partition

-- Check query plan
EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01';
-- Shows which partitions are scanned
```

## Performance Tuning

### Query Optimization Techniques

**PostgreSQL Query Analysis:**
```sql
-- Basic explain
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

-- Analyze with actual execution statistics
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 10;

-- Identify slow queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Table statistics
ANALYZE users; -- Update query planner statistics

-- Vacuum and analyze
VACUUM ANALYZE posts; -- Reclaim space and update stats
```

**Common Query Patterns:**
```sql
-- Avoid SELECT * (retrieve only needed columns)
-- BAD
SELECT * FROM users WHERE id = 123;

-- GOOD
SELECT id, username, email FROM users WHERE id = 123;

-- Use EXISTS instead of IN for large subqueries
-- BAD
SELECT * FROM posts WHERE author_id IN (
    SELECT id FROM users WHERE active = true
);

-- GOOD
SELECT * FROM posts p WHERE EXISTS (
    SELECT 1 FROM users u
    WHERE u.id = p.author_id AND u.active = true
);

-- Use JOINs instead of multiple queries
-- BAD (N+1 query problem)
-- SELECT * FROM posts;
-- Then for each post: SELECT * FROM users WHERE id = post.author_id;

-- GOOD
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

-- Window functions instead of self-joins
-- Calculate running total
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- Rank within groups
SELECT
    category,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;
```

**MongoDB Query Optimization:**
```javascript
// Use projection to limit returned fields
// BAD
db.users.find({ active: true })

// GOOD
db.users.find(
    { active: true },
    { username: 1, email: 1, _id: 0 }
)

// Use covered queries (index covers all fields)
db.users.createIndex({ username: 1, email: 1 })
db.users.find(
    { username: "john_doe" },
    { username: 1, email: 1, _id: 0 }
) // Entire query served from index

// Avoid negation operators
// BAD (cannot use index efficiently)
db.products.find({ status: { $ne: "discontinued" } })

// GOOD
db.products.find({ status: { $in: ["active", "pending", "sold"] } })

// Use $lookup sparingly (expensive operation)
// Consider embedding data instead if appropriate

// Aggregation optimization: Filter early
// BAD
db.orders.aggregate([
    { $lookup: { ... } },        // Expensive join
    { $match: { status: "completed" } } // Filter after join
])

// GOOD
db.orders.aggregate([
    { $match: { status: "completed" } }, // Filter first
    { $lookup: { ... } }                  // Join fewer documents
])
```

### Connection Pooling

**PostgreSQL Connection Pooling:**
```javascript
// Using node-postgres (pg) with pool
const { Pool } = require('pg')

const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'mydb',
    user: 'dbuser',
    password: 'secret',
    max: 20,              // Maximum pool size
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000
})

// Execute query
const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

// Use PgBouncer for server-side pooling
// pgbouncer.ini
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25
```

**MongoDB Connection Pooling:**
```javascript
// MongoClient automatically manages connection pool
const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, {
    maxPoolSize: 50,           // Max connections
    minPoolSize: 10,           // Min connections
    maxIdleTimeMS: 30000,      // Close idle connections
    waitQueueTimeoutMS: 5000   // Wait for available connection
})

await client.connect()
const db = client.db('mydb')
// Connection automatically returned to pool after use
```

## Best Practices

### PostgreSQL Best Practices

1. **Schema Design**
   - Normalize for data integrity, denormalize for performance
   - Use appropriate data types (avoid TEXT for short strings)
   - Define NOT NULL constraints where appropriate
   - Use SERIAL or UUID for primary keys consistently

2. **Indexing**
   - Index foreign keys for JOIN performance
   - Create indexes on frequently filtered/sorted columns
   - Use partial indexes for selective queries
   - Monitor and remove unused indexes
   - Keep composite index column count reasonable (typically ≤ 3-4)

3. **Query Performance**
   - Use EXPLAIN ANALYZE to understand query plans
   - Avoid SELECT * in application code
   - Use prepared statements to prevent SQL injection
   - Limit result sets with LIMIT
   - Use connection pooling

4. **Maintenance**
   - Run VACUUM regularly (or enable autovacuum)
   - Update statistics with ANALYZE
   - Monitor slow query log
   - Set appropriate autovacuum thresholds
   - Regular backup with pg_dump or WAL archiving

5. **Security**
   - Use SSL/TLS for connections
   - Implement row-level security for multi-tenant apps
   - Grant minimum necessary privileges
   - Use parameterized queries
   - Regular security updates

### MongoDB Best Practices

1. **Schema Design**
   - Embed related data that is accessed together
   - Reference data that is large or rarely accessed
   - Use polymorphic pattern for varied schemas
   - Limit document size to reasonable bounds (< 1-2 MB typically)
   - Design for your query patterns

2. **Indexing**
   - Index on fields used in queries and sorts
   - Use compound indexes with ESR rule (Equality, Sort, Range)
   - Create text indexes for full-text search
   - Monitor index usage with $indexStats
   - Avoid too many indexes (write performance impact)

3. **Query Performance**
   - Use projection to limit returned fields
   - Create covered queries when possible
   - Filter early in aggregation pipelines
   - Avoid $lookup when embedding is appropriate
   - Use explain() to verify index usage

4. **Scalability**
   - Choose appropriate shard key (high cardinality, even distribution)
   - Use replica sets for high availability
   - Configure appropriate read/write concerns
   - Monitor chunk distribution in sharded clusters
   - Use zones for geographic distribution

5. **Operations**
   - Enable authentication and authorization
   - Use TLS for client connections
   - Regular backups (mongodump or filesystem snapshots)
   - Monitor with MongoDB Atlas, Ops Manager, or custom tools
   - Keep MongoDB version updated

### Data Modeling Decision Framework

**Choose PostgreSQL when:**
- Strong ACID guarantees required (financial transactions)
- Complex relationships with many JOINs
- Data structure is well-defined and stable
- Need for advanced SQL features (window functions, CTEs, stored procedures)
- Compliance requirements demand strict consistency

**Choose MongoDB when:**
- Schema flexibility needed (rapid development, evolving requirements)
- Horizontal scalability is priority (sharding required)
- Document-oriented data (JSON/BSON native format)
- Hierarchical or nested data structures
- High write throughput with eventual consistency acceptable

**Hybrid Approach:**
- Use both databases for different parts of application
- PostgreSQL for transactional data (orders, payments)
- MongoDB for catalog, logs, user sessions
- Synchronize critical data between systems

## Common Patterns and Anti-Patterns

### PostgreSQL Anti-Patterns

❌ **Storing JSON when relational fits better**
```sql
-- BAD: Using JSONB for structured, queryable data
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB -- { name, email, address: { street, city, state } }
);

-- GOOD: Proper normalization
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50)
);
```

❌ **Over-indexing**
```sql
-- BAD: Index on every column "just in case"
CREATE INDEX idx1 ON users(username);
CREATE INDEX idx2 ON users(email);
CREATE INDEX idx3 ON users(created_at);
CREATE INDEX idx4 ON users(updated_at);
CREATE INDEX idx5 ON users(active);
-- Result: Slow writes, large database size

-- GOOD: Index based on actual query patterns
CREATE INDEX idx_users_email ON users(email); -- Login queries
CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- Partial
```

❌ **N+1 Query Problem**
```sql
-- BAD: Multiple queries in loop
SELECT * FROM posts; -- Returns 100 posts
-- Then for each post:
SELECT * FROM users WHERE id = ?; -- 100 additional queries!

-- GOOD: Single query with JOIN
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;
```

### MongoDB Anti-Patterns

❌ **Massive arrays in documents**
```javascript
// BAD: Unbounded array growth
{
    _id: ObjectId("..."),
    username: "popular_user",
    followers: [
        ObjectId("follower1"),
        ObjectId("follower2"),
        // ... 100,000+ follower IDs
        // Document exceeds 16MB limit!
    ]
}

// GOOD: Separate collection with references
// users collection
{ _id: ObjectId("..."), username: "popular_user" }

// followers collection
{ _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") }
db.followers.createIndex({ user_id: 1, follower_id: 1 })
```

❌ **Poor shard key selection**
```javascript
// BAD: Monotonically increasing shard key
sh.shardCollection("mydb.events", { _id: 1 })
// All writes go to same shard (highest _id range)

// BAD: Low cardinality shard key
sh.shardCollection("mydb.users", { country: 1 })
// Most users in few countries = uneven distribution

// GOOD: Hashed _id or compound key
sh.shardCollection("mydb.events", { _id: "hashed" }) // Even distribution
sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // Compound
```

❌ **Ignoring indexes on embedded documents**
```javascript
// Document structure
{
    username: "john_doe",
    profile: {
        email: "[email protected]",
        age: 30,
        city: "San Francisco"
    }
}

// Query on embedded field
db.users.find({ "profile.email": "[email protected]" })

// MISSING: Index on embedded field
db.users.createIndex({ "profile.email": 1 })
```

## Troubleshooting Guide

### PostgreSQL Issues

**Slow Queries:**
```sql
-- Enable slow query logging (postgresql.conf)
-- log_min_duration_statement = 1000  # Log queries > 1 second

-- Find slow queries
SELECT
    query,
    calls,
    total_exec_time / calls as avg_time_ms,
    rows / calls as avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

-- Analyze specific slow query
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ... FROM ... WHERE ...;
```

**High CPU Usage:**
```sql
-- Check running queries
SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Terminate long-running query
SELECT pg_terminate_backend(pid);
```

**Lock Contention:**
```sql
-- View locks
SELECT
    locktype,
    relation::regclass,
    mode,
    granted,
    pid
FROM pg_locks
WHERE NOT granted;

-- Find blocking queries
SELECT
    blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;
```

### MongoDB Issues

**Slow Queries:**
```javascript
// Enable profiling
db.setProfilingLevel(1, { slowms: 100 }) // Log queries > 100ms

// View slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10)

// Analyze query performance
db.collection.find({ ... }).explain("executionStats")
// Check: totalDocsExamined vs nReturned (should be close)
// Check: executionTimeMillis
// Check: indexName (should show index usage)
```

**Replication Lag:**
```javascript
// Check lag on secondary
rs.printSecondaryReplicationInfo()

// Check oplog size
db.getReplicationInfo()

// Increase oplog size if needed
db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB
```

**Sharding Issues:**
```javascript
// Check chunk distribution
sh.status()

// Check balancer status
sh.getBalancerState()
sh.isBalancerRunning()

// Balance specific collection
sh.enableBalancing("mydb.mycollection")

// Check for jumbo chunks
db.chunks.find({ jumbo: true })
```

## Resources

### PostgreSQL Resources
- Official Documentation: https://www.postgresql.org/docs/
- PostgreSQL Wiki: https://wiki.postgresql.org/
- Performance Tuning: https://wiki.postgresql.org/wiki/Performance_Optimization
- Explain Visualizer: https://explain.dalibo.com/
- pg_stat_statements Extension: Essential for query analysis

### MongoDB Resources
- Official Documentation: https://docs.mongodb.com/
- MongoDB University: Free courses and certification
- Aggregation Framework: https://docs.mongodb.com/manual/aggregation/
- Sharding Guide: https://docs.mongodb.com/manual/sharding/
- Schema Design Patterns: https://www.mongodb.com/blog/post/building-with-patterns-a-summary

### Books
- PostgreSQL: "PostgreSQL: Up and Running" by Regina Obe & Leo Hsu
- MongoDB: "MongoDB: The Definitive Guide" by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

---

**Skill Version**: 1.0.0
**Last Updated**: January 2025
**Skill Category**: Database Management, Data Architecture, Performance Optimization
**Technologies**: PostgreSQL 16+, MongoDB 7+
Overview

This skill is a comprehensive guide to database management patterns for PostgreSQL and MongoDB. It focuses on schema design, indexing, transactions, replication, sharding, and performance tuning to build production-grade systems. The guidance covers both relational and document modeling trade-offs and practical patterns for reliability and scale.
How this skill works

It inspects core design choices and maps them to targeted patterns: normalization and denormalization, embedding versus referencing, indexing strategies, and ACID/CAP considerations. It explains concrete implementations such as primary key strategies, constraints, triggers, materialized views, bucketed time-series, and pre-aggregated documents. The skill also outlines replication, sharding, transaction controls, and query optimization techniques to diagnose and improve performance.
When to use it

Designing or refactoring schemas for new applications
Choosing between PostgreSQL and MongoDB for specific workloads
Optimizing slow queries with indexes and query plans
Implementing consistency and transactional guarantees
Scaling read/write capacity via replication or sharding
Modeling time-series or nested data for efficient access
Best practices

Favor normalization for complex relational integrity; denormalize selectively for read-heavy paths
Use UUIDs for distributed ids and serial keys for simple single-node systems
Design indexes to support common query shapes and avoid over-indexing
Prefer embedding for one-to-few relationships and referencing for unbounded relations
Use materialized views or pre-aggregated documents for expensive joins and repeated analytics
Test and monitor query plans, cache hot paths, and tune configuration (work_mem, shared_buffers, WiredTiger/replica settings)
Example use cases

Financial ledger with strict ACID requirements implemented in PostgreSQL with normalized schema and transactional boundaries
Content management system using MongoDB embedding for comments on small posts and referencing for large-scale user-post relationships
High-volume event ingestion using MongoDB bucket pattern to compress time-series measurements into hourly documents
Analytics dashboard backed by PostgreSQL materialized views refreshed concurrently for near-real-time reports
Geo-distributed read scaling using MongoDB replica sets and configured read preferences for regional latency
Order processing using composite keys, foreign keys, and audit triggers in PostgreSQL to ensure traceability
FAQ

When should I denormalize a relational schema?
Denormalize when joins are the main performance bottleneck and you have read-heavy access patterns that benefit from fewer joins; keep strong tests and update paths to maintain consistency.
How do I choose embedding vs referencing in MongoDB?
Embed for tightly-coupled, bounded child data accessed with the parent; reference when child collections grow unbounded or must be queried independently.