home / skills / amnadtaowsoam / cerebraskills / dbt-patterns
This skill provides a practical guide to dbt patterns, modeling layers, tests, and production workflows for reliable analytics transformations.
npx playbooks add skill amnadtaowsoam/cerebraskills --skill dbt-patternsReview the files below or copy the command above to add this skill to your agents.
---
name: dbt Patterns
description: Comprehensive guide to dbt (data build tool) patterns, modeling best practices, testing strategies, and production workflows for modern data transformation
---
# dbt Patterns
## What is dbt?
**dbt (data build tool):** SQL-first transformation tool that enables analytics engineers to transform data in the warehouse using SELECT statements.
### Core Concept
```
Raw Data (Extract & Load) → dbt (Transform) → Analytics-Ready Data
Traditional ETL: Extract → Transform → Load
Modern ELT: Extract → Load → Transform (with dbt)
```
### Why dbt?
- **Version control:** SQL as code (Git)
- **Testing:** Built-in data quality tests
- **Documentation:** Auto-generated docs
- **Modularity:** Reusable models
- **Lineage:** Visual data lineage
- **Collaboration:** Team workflows
---
## dbt Project Structure
### Standard Layout
```
my_dbt_project/
├── dbt_project.yml # Project configuration
├── profiles.yml # Connection profiles
├── models/ # SQL models
│ ├── staging/ # Raw data cleaning
│ ├── intermediate/ # Business logic
│ └── marts/ # Final analytics tables
├── tests/ # Custom tests
├── macros/ # Reusable SQL
├── seeds/ # CSV reference data
├── snapshots/ # SCD Type 2
└── analyses/ # Ad-hoc queries
```
### dbt_project.yml
```yaml
name: 'my_project'
version: '1.0.0'
config-version: 2
profile: 'my_profile'
model-paths: ["models"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
models:
my_project:
staging:
+materialized: view
+schema: staging
intermediate:
+materialized: ephemeral
marts:
+materialized: table
+schema: analytics
```
---
## Model Layers (Staging → Intermediate → Marts)
### Staging Layer
**Purpose:** Clean and standardize raw data
**Pattern:**
```sql
-- models/staging/stg_orders.sql
with source as (
select * from {{ source('raw', 'orders') }}
),
renamed as (
select
id as order_id,
user_id,
created_at as order_created_at,
status as order_status,
total_amount
from source
)
select * from renamed
```
**Best Practices:**
- One staging model per source table
- Rename columns to consistent naming
- Cast data types
- No business logic
- Materialized as views (lightweight)
### Intermediate Layer
**Purpose:** Business logic and transformations
**Pattern:**
```sql
-- models/intermediate/int_orders_with_customer.sql
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('stg_customers') }}
),
joined as (
select
orders.*,
customers.customer_name,
customers.customer_segment
from orders
left join customers
on orders.user_id = customers.customer_id
)
select * from joined
```
**Best Practices:**
- Complex joins
- Business logic
- Calculations
- Materialized as ephemeral (not persisted)
### Marts Layer
**Purpose:** Final analytics-ready tables
**Pattern:**
```sql
-- models/marts/fct_orders.sql
with orders as (
select * from {{ ref('int_orders_with_customer') }}
),
aggregated as (
select
order_id,
user_id,
customer_name,
order_created_at,
order_status,
total_amount,
case
when order_status = 'completed' then total_amount
else 0
end as completed_revenue
from orders
)
select * from aggregated
```
**Best Practices:**
- Business-friendly naming
- Denormalized for analytics
- Materialized as tables (fast queries)
- Documented
---
## Naming Conventions
### Model Naming
```
Staging: stg_<source>_<table>
stg_salesforce_accounts
stg_stripe_payments
Intermediate: int_<entity>_<verb>
int_orders_joined
int_customers_enriched
Facts: fct_<entity>
fct_orders
fct_revenue
Dimensions: dim_<entity>
dim_customers
dim_products
```
### Column Naming
```
IDs: <entity>_id
customer_id, order_id
Dates: <entity>_<verb>_at
order_created_at, customer_updated_at
Booleans: is_<condition> or has_<condition>
is_active, has_subscription
Amounts: <entity>_amount
order_amount, refund_amount
```
---
## Materializations
### View
```sql
{{ config(materialized='view') }}
select * from {{ ref('stg_orders') }}
```
- **Pros:** Always fresh, no storage
- **Cons:** Slow queries (recomputed each time)
- **Use for:** Staging models, rarely queried
### Table
```sql
{{ config(materialized='table') }}
select * from {{ ref('fct_orders') }}
```
- **Pros:** Fast queries
- **Cons:** Storage cost, stale data
- **Use for:** Marts, frequently queried
### Incremental
```sql
{{ config(
materialized='incremental',
unique_key='order_id'
) }}
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where order_created_at > (select max(order_created_at) from {{ this }})
{% endif %}
```
- **Pros:** Fast builds, handles large data
- **Cons:** Complex logic
- **Use for:** Large fact tables
### Ephemeral
```sql
{{ config(materialized='ephemeral') }}
select * from {{ ref('stg_orders') }}
```
- **Pros:** No storage, CTE in downstream models
- **Cons:** Recomputed in each downstream model
- **Use for:** Intermediate models
---
## Testing
### Schema Tests
```yaml
# models/schema.yml
version: 2
models:
- name: fct_orders
description: "Order facts table"
columns:
- name: order_id
description: "Unique order identifier"
tests:
- unique
- not_null
- name: user_id
description: "Customer who placed order"
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_status
tests:
- accepted_values:
values: ['pending', 'completed', 'cancelled']
- name: total_amount
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
```
### Custom Tests
```sql
-- tests/assert_positive_revenue.sql
select *
from {{ ref('fct_orders') }}
where total_amount < 0
```
### Test Types
- **unique:** No duplicates
- **not_null:** No nulls
- **accepted_values:** Value in list
- **relationships:** Foreign key check
- **custom:** SQL query returns 0 rows
---
## Macros
### Reusable SQL
```sql
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
({{ column_name }} / 100.0)::decimal(10,2)
{% endmacro %}
```
**Usage:**
```sql
select
order_id,
{{ cents_to_dollars('total_amount_cents') }} as total_amount_dollars
from {{ ref('stg_orders') }}
```
### Common Macros
```sql
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) %}
{%- if custom_schema_name is none -%}
{{ target.schema }}
{%- else -%}
{{ custom_schema_name | trim }}
{%- endif -%}
{% endmacro %}
```
---
## Sources
### Defining Sources
```yaml
# models/staging/sources.yml
version: 2
sources:
- name: raw
database: analytics_db
schema: raw_data
tables:
- name: orders
description: "Raw orders from production DB"
columns:
- name: id
tests:
- unique
- not_null
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _loaded_at
```
### Using Sources
```sql
select * from {{ source('raw', 'orders') }}
```
### Source Freshness
```bash
dbt source freshness
```
---
## Documentation
### Model Documentation
```yaml
# models/schema.yml
version: 2
models:
- name: fct_orders
description: |
Order facts table containing all orders with customer information.
This table is updated daily at 2 AM UTC.
**Business Rules:**
- Only includes orders with status 'completed' or 'pending'
- Cancelled orders are excluded
columns:
- name: order_id
description: "Unique identifier for each order"
- name: total_amount
description: "Total order amount in dollars"
```
### Generate Docs
```bash
dbt docs generate
dbt docs serve
```
**Output:** Interactive documentation website with:
- Model descriptions
- Column descriptions
- Data lineage (DAG)
- Source freshness
---
## Snapshots (SCD Type 2)
### Snapshot Configuration
```sql
-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}
select * from {{ source('raw', 'customers') }}
{% endsnapshot %}
```
### Run Snapshots
```bash
dbt snapshot
```
### Output
```
customer_id | name | updated_at | dbt_valid_from | dbt_valid_to | dbt_scd_id
1 | John | 2024-01-01 | 2024-01-01 | 2024-01-15 | abc123
1 | John | 2024-01-15 | 2024-01-15 | null | def456
```
---
## Incremental Models
### Basic Incremental
```sql
{{ config(
materialized='incremental',
unique_key='order_id'
) }}
select
order_id,
user_id,
order_created_at,
total_amount
from {{ ref('stg_orders') }}
{% if is_incremental() %}
-- Only process new/updated records
where order_created_at > (select max(order_created_at) from {{ this }})
{% endif %}
```
### Incremental with Delete+Insert
```sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='delete+insert'
) }}
```
### Incremental with Merge
```sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
merge_update_columns=['order_status', 'total_amount']
) }}
```
---
## Packages
### Installing Packages
```yaml
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.1.0
- package: calogica/dbt_expectations
version: 0.9.0
- package: dbt-labs/codegen
version: 0.11.0
```
```bash
dbt deps
```
### Using Packages
```sql
-- dbt_utils
select
{{ dbt_utils.surrogate_key(['order_id', 'user_id']) }} as unique_key,
*
from {{ ref('stg_orders') }}
-- dbt_expectations
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
```
---
## Production Workflows
### Development Workflow
```bash
# 1. Create feature branch
git checkout -b feature/new-model
# 2. Develop model
# Edit models/marts/fct_new_model.sql
# 3. Run model
dbt run --select fct_new_model
# 4. Test model
dbt test --select fct_new_model
# 5. Document model
# Edit models/schema.yml
# 6. Commit and push
git add .
git commit -m "Add new model"
git push origin feature/new-model
# 7. Create PR
# Review, approve, merge
```
### CI/CD Pipeline
```yaml
# .github/workflows/dbt_ci.yml
name: dbt CI
on: [pull_request]
jobs:
dbt_run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install dbt
run: pip install dbt-snowflake
- name: Run dbt
run: |
dbt deps
dbt run --select state:modified+ --defer --state ./prod_manifest
dbt test --select state:modified+ --defer --state ./prod_manifest
```
### Production Deployment
```bash
# Daily production run
dbt run --target prod
dbt test --target prod
dbt source freshness --target prod
```
---
## Performance Optimization
### Optimize Queries
```sql
-- Bad: Multiple CTEs with same source
with orders_1 as (
select * from {{ ref('stg_orders') }}
where status = 'completed'
),
orders_2 as (
select * from {{ ref('stg_orders') }}
where status = 'pending'
)
-- Good: Single CTE, filter later
with orders as (
select * from {{ ref('stg_orders') }}
where status in ('completed', 'pending')
),
completed_orders as (
select * from orders where status = 'completed'
),
pending_orders as (
select * from orders where status = 'pending'
)
```
### Use Incremental Models
```sql
-- For large tables (millions of rows)
{{ config(materialized='incremental') }}
```
### Partition Tables
```sql
{{ config(
materialized='table',
partition_by={
"field": "order_date",
"data_type": "date",
"granularity": "day"
}
) }}
```
---
## Best Practices
### 1. One Model Per File
```
✓ models/staging/stg_orders.sql
✗ models/staging/all_staging_models.sql
```
### 2. Use CTEs, Not Subqueries
```sql
-- Good
with orders as (
select * from {{ ref('stg_orders') }}
)
select * from orders
-- Bad
select * from (
select * from {{ ref('stg_orders') }}
) as orders
```
### 3. Explicit Column Selection
```sql
-- Good
select
order_id,
user_id,
total_amount
from {{ ref('stg_orders') }}
-- Bad
select * from {{ ref('stg_orders') }}
```
### 4. Test Everything
```yaml
# Every model should have tests
models:
- name: fct_orders
columns:
- name: order_id
tests: [unique, not_null]
```
### 5. Document Everything
```yaml
# Every model and column should have description
models:
- name: fct_orders
description: "Order facts table"
columns:
- name: order_id
description: "Unique order ID"
```
---
## Common Patterns
### Slowly Changing Dimensions (SCD)
```sql
-- Type 1: Overwrite
{{ config(materialized='table') }}
select * from {{ ref('stg_customers') }}
-- Type 2: Historical tracking
{% snapshot customers_snapshot %}
{{ config(strategy='timestamp', updated_at='updated_at') }}
select * from {{ ref('stg_customers') }}
{% endsnapshot %}
```
### Fact Tables
```sql
-- Transactional facts
{{ config(materialized='incremental', unique_key='order_id') }}
select
order_id,
customer_id,
product_id,
order_date,
quantity,
amount
from {{ ref('stg_orders') }}
```
### Dimension Tables
```sql
-- Dimension table
{{ config(materialized='table') }}
select
customer_id,
customer_name,
customer_segment,
customer_region
from {{ ref('stg_customers') }}
```
---
## Troubleshooting
### Common Errors
**Compilation Error:**
```
Compilation Error in model fct_orders
Model 'stg_orders' not found
```
**Fix:** Check model name, ensure it exists
**Test Failure:**
```
Failure in test unique_fct_orders_order_id
Got 5 results, expected 0
```
**Fix:** Investigate duplicate order_ids
**Freshness Error:**
```
Source 'raw.orders' is stale (loaded 25 hours ago)
```
**Fix:** Check ETL pipeline, data loading
---
## Summary
**dbt:** SQL-first transformation tool
**Layers:**
- Staging: Clean raw data
- Intermediate: Business logic
- Marts: Analytics-ready
**Materializations:**
- View: Always fresh, slow
- Table: Fast, stale
- Incremental: Large data
- Ephemeral: No storage
**Testing:**
- unique, not_null, accepted_values, relationships
- Custom SQL tests
**Best Practices:**
- One model per file
- Use CTEs
- Test everything
- Document everything
- Version control
**Workflow:**
- Develop → Test → Document → PR → Merge → Deploy
This skill is a practical guide to dbt patterns for building reliable, maintainable data transformation pipelines. It condenses modeling layers, materializations, testing, macros, snapshots, incremental strategies, and production workflows into actionable guidance. Use it to standardize dbt projects, improve data quality, and accelerate deployments.
The skill inspects common dbt project structure and prescribes patterns for staging, intermediate, and marts layers, plus naming and materialization choices. It explains when to use views, tables, incremental and ephemeral models, how to implement tests and snapshots, and how to compose macros and CI/CD workflows. Recommendations focus on reproducible SQL, test coverage, and performance optimizations.
How do I choose materializations for each layer?
Use views for lightweight staging and rapid iteration, ephemeral for reusable logic not persisted, incremental for very large facts, and tables for final marts that need fast query performance.
When should I use snapshots vs incremental models?
Use snapshots for SCD Type 2 history and temporal tracking of source rows. Use incremental models to efficiently append or merge new/changed records in large fact tables.