home / skills / microck / ordinary-claude-skills / pg-dump-reference

pg-dump-reference skill

/skills_all/pg-dump-reference

This skill guides implementing pgschema by aligning system catalog queries and DDL patterns with pg_dump references for PostgreSQL schemas.

npx playbooks add skill microck/ordinary-claude-skills --skill pg-dump-reference

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
8.6 KB
---
name: pg_dump Reference
description: Consult PostgreSQL's pg_dump implementation for guidance on system catalog queries and schema extraction when implementing pgschema features
---

# pg_dump Reference

Use this skill when implementing or debugging pgschema features that involve extracting schema information from PostgreSQL databases. pg_dump is the canonical PostgreSQL schema dumping tool and serves as a reference implementation for how to query system catalogs correctly.

## When to Use This Skill

Invoke this skill when:
- Adding support for new PostgreSQL schema objects
- Debugging system catalog queries in `ir/inspector.go`
- Understanding how PostgreSQL represents objects internally
- Handling version-specific PostgreSQL features (versions 14-17)
- Learning correct DDL formatting patterns
- Understanding object dependency relationships

## Source Code Locations

**Main pg_dump repository**: https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/

**Key files to reference**:
- `pg_dump.c` - Main implementation with system catalog queries
- `pg_dump.h` - Data structures and function declarations
- `pg_dump_sort.c` - Dependency sorting logic
- `pg_backup_archiver.c` - Output formatting
- `common.c` - Shared utility functions for querying system catalogs

## Step-by-Step Workflow

### 1. Identify the Schema Object

Determine which PostgreSQL object type you're working with:
- Tables and columns
- Constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK)
- Indexes (regular, unique, partial, functional)
- Triggers (including WHEN conditions, constraint triggers)
- Views and materialized views
- Functions and procedures
- Sequences
- Types (enum, composite, domain)
- Policies (row-level security)
- Aggregates
- Comments

### 2. Find the Relevant pg_dump Function

Search pg_dump.c for the function that handles your object type:

| Object Type | pg_dump Function | System Catalogs Used |
|-------------|------------------|---------------------|
| Tables & Columns | `getTables()` | `pg_class`, `pg_attribute`, `pg_type` |
| Indexes | `getIndexes()` | `pg_index`, `pg_class` |
| Triggers | `getTriggers()` | `pg_trigger`, `pg_proc` |
| Functions | `getFuncs()` | `pg_proc` |
| Procedures | `getProcs()` | `pg_proc` |
| Views | `getViews()` | `pg_class`, `pg_rewrite` |
| Materialized Views | `getMatViews()` | `pg_class` |
| Sequences | `getSequences()` | `pg_sequence`, `pg_class` |
| Constraints | `getConstraints()` | `pg_constraint` |
| Policies | `getPolicies()` | `pg_policy` |
| Aggregates | `getAggregates()` | `pg_aggregate`, `pg_proc` |
| Types | `getTypes()` | `pg_type` |
| Comments | `getComments()` | `pg_description` |

### 3. Analyze the System Catalog Query

Examine the SQL query used by pg_dump:
- Which system catalog tables are joined
- Which columns are selected
- How version-specific features are handled
- How PostgreSQL internal functions are used (`pg_get_expr`, `pg_get_constraintdef`, etc.)

**Example - Extracting trigger WHEN conditions**:

```sql
-- pg_dump's approach (from getTriggers):
SELECT t.tgname,
       pg_get_expr(t.tgqual, t.tgrelid, false) as when_clause
FROM pg_catalog.pg_trigger t
WHERE t.tgqual IS NOT NULL
```

Note: `information_schema.triggers.action_condition` is NOT reliable for WHEN clauses. Always use `pg_get_expr(t.tgqual, ...)` from `pg_catalog.pg_trigger`.

### 4. Check for Special Cases

Look for how pg_dump handles:
- **Version compatibility**: Different queries for different PostgreSQL versions
- **NULL handling**: How missing values are interpreted
- **Default values**: System vs. user-defined defaults
- **Internal objects**: Filtering out system-generated objects
- **Dependencies**: How object relationships are tracked

### 5. Adapt for pgschema

Apply the pattern to pgschema's codebase:

**For database introspection** (`ir/inspector.go`):
- Adapt the system catalog query for Go/pgx
- Use pgx parameter binding for safety
- Handle NULL values appropriately
- Add proper error handling

**For SQL parsing** (`ir/parser.go`):
- Understand how pg_dump formats DDL
- Use pg_query_go to parse SQL statements
- Extract relevant fields into IR structures

**For DDL generation** (`internal/diff/*.go`):
- Follow pg_dump's quoting rules
- Use PostgreSQL functions for formatting complex expressions
- Handle version-specific syntax

## Key System Catalog Tables

### Core Tables
- `pg_class` - Tables, indexes, views, sequences
- `pg_attribute` - Table columns
- `pg_type` - Data types
- `pg_constraint` - Constraints (PK, FK, UNIQUE, CHECK)
- `pg_index` - Index definitions

### Functions & Triggers
- `pg_proc` - Functions, procedures, trigger functions
- `pg_trigger` - Trigger definitions
- `pg_aggregate` - Aggregate function definitions

### Access Control
- `pg_policy` - Row-level security policies

### Metadata
- `pg_description` - Comments on database objects
- `pg_depend` - Object dependencies

### Helper Functions
- `pg_get_expr(expr, relation, pretty)` - Deparse expressions
- `pg_get_constraintdef(constraint_oid, pretty)` - Get constraint definition
- `pg_get_indexdef(index_oid, column, pretty)` - Get index definition
- `pg_get_triggerdef(trigger_oid, pretty)` - Get trigger definition

## Important Considerations

### pgschema is NOT pg_dump

**Key differences**:
- **Format**: pgschema outputs declarative schema files for editing, pg_dump creates archive dumps for restore
- **Scope**: pgschema focuses on single-schema objects, pg_dump handles entire databases
- **Workflow**: pgschema supports plan/apply (Terraform-style), pg_dump is dump/restore only
- **Normalization**: pgschema normalizes for comparison, pg_dump preserves exact format

### When NOT to Copy pg_dump

Don't blindly copy pg_dump for:
- Output formatting (pgschema has different conventions)
- Archive/restore logic (not applicable to pgschema)
- Full database dumps (pgschema is schema-focused)
- Ancient version support (pgschema supports PostgreSQL 14+)

### When pg_dump is Authoritative

Always reference pg_dump for:
- System catalog query patterns
- Understanding PostgreSQL internals
- Correct use of `pg_get_*` functions
- Version-specific feature detection
- Object dependency tracking

## Examples

### Example 1: Extracting Generated Column Information

**pg_dump approach**:
```sql
SELECT a.attname,
       a.attgenerated,
       pg_get_expr(ad.adbin, ad.adrelid) as generation_expr
FROM pg_attribute a
LEFT JOIN pg_attrdef ad ON (a.attrelid = ad.adrelid AND a.attnum = ad.adnum)
WHERE a.attgenerated != ''
```

**pgschema adaptation** (in `ir/inspector.go`):
```go
query := `
SELECT a.attname,
       a.attgenerated,
       pg_get_expr(ad.adbin, ad.adrelid) as generation_expr
FROM pg_attribute a
LEFT JOIN pg_attrdef ad ON (a.attrelid = ad.adrelid AND a.attnum = ad.adnum)
WHERE a.attrelid = $1 AND a.attgenerated != ''
`
rows, err := conn.Query(ctx, query, tableOID)
```

### Example 2: Handling Partial Indexes

**pg_dump extracts WHERE clauses**:
```sql
SELECT pg_get_expr(i.indpred, i.indrelid, true) as index_predicate
FROM pg_index i
WHERE i.indpred IS NOT NULL
```

**pgschema stores in IR** (`ir/ir.go`):
```go
type Index struct {
    Name      string
    Columns   []string
    Predicate string  // WHERE clause for partial indexes
    // ...
}
```

## Tips for Success

1. **Search strategically**: Clone postgres repo and use grep/ag to search for specific system catalog columns or keywords

2. **Check git history**: Use `git log -p` or GitHub blame to see when features were added and understand the evolution

3. **Read comments carefully**: pg_dump.c contains valuable comments explaining PostgreSQL internals and edge cases

4. **Cross-reference documentation**: Always combine pg_dump source with official PostgreSQL documentation:
   - System catalogs: https://www.postgresql.org/docs/current/catalogs.html
   - Functions: https://www.postgresql.org/docs/current/functions-info.html

5. **Test incrementally**: After adapting from pg_dump, test against real PostgreSQL instances using pgschema's embedded-postgres integration tests

6. **Version awareness**: Check how pg_dump handles version differences - pgschema supports PostgreSQL 14-17, so you may need conditional logic

## Verification Checklist

After consulting pg_dump and implementing in pgschema:

- [ ] System catalog query correctly extracts all necessary fields
- [ ] NULL values are handled appropriately
- [ ] Version-specific features are detected and handled
- [ ] Internal/system objects are filtered out
- [ ] Dependencies are tracked correctly
- [ ] Integration test added in `testdata/diff/`
- [ ] Test passes with `go test -v ./internal/diff -run TestDiffFromFiles`
- [ ] Test passes with `go test -v ./cmd -run TestPlanAndApply`
- [ ] Tested against multiple PostgreSQL versions (14-17)

Overview

This skill provides a concise reference to PostgreSQL's pg_dump implementation to guide extracting schema information and querying system catalogs when implementing pgschema features. It highlights where pg_dump queries system catalogs, which helper functions to use, and how to adapt patterns for Go/pgx-based introspection and DDL generation.

How this skill works

The skill points you to pg_dump source files and the specific functions that handle each object type. It explains which system catalog tables and pg_get_* helper functions pg_dump uses, and shows example queries and patterns to adapt into pgschema's inspector, parser, and diff code. It emphasizes version-aware queries, NULL handling, and dependency sorting logic.

When to use it

  • Adding support for a new PostgreSQL object type (indexes, triggers, types, policies, etc.)
  • Debugging or rewriting system catalog queries in ir/inspector.go
  • Implementing correct expression deparsing (WHEN, defaults, predicates) using pg_get_expr and friends
  • Handling PostgreSQL version differences (supported range: 14–17)
  • Designing dependency-aware DDL generation or normalization for pgschema

Best practices

  • Follow pg_dump query patterns for selecting and joining system catalogs rather than relying on information_schema
  • Use pg_get_expr/pg_get_constraintdef/pg_get_indexdef to obtain canonical expressions and definitions
  • Adapt queries to pgx with parameter binding and explicit NULL handling
  • Filter out internal/system objects as pg_dump does and respect version-specific columns
  • Test incrementally against real PostgreSQL instances and add integration tests for each change

Example use cases

  • Extract trigger WHEN clauses using pg_get_expr(t.tgqual, t.tgrelid, false) instead of information_schema
  • Retrieve generated column expressions and store them in the IR with a query adapted from pg_dump
  • Capture partial index predicates with pg_get_expr(i.indpred, i.indrelid, true) and map to Index.Predicate
  • Implement dependency sorting logic inspired by pg_dump_sort.c for reliable DDL ordering
  • Detect and handle new type metadata by following getTypes() and pg_type usage in pg_dump

FAQ

Is it safe to copy pg_dump queries verbatim?

Use pg_dump queries as authoritative patterns for catalog access, but adapt formatting, error handling, and output conventions to pgschema's goals rather than copying output formatting or archive logic.

Which helper functions should I prefer for complex expressions?

Prefer pg_get_expr, pg_get_constraintdef, pg_get_indexdef, and pg_get_triggerdef to deparse expressions consistently across versions.