home / skills / dbt-labs / dbt-agent-skills / answering-natural-language-questions-with-dbt

answering-natural-language-questions-with-dbt skill

safe

/skills/answering-natural-language-questions-with-dbt

This skill answers natural language business questions by querying dbt semantic layer, then explores models or manifests if needed to provide an accurate

npx playbooks add skill dbt-labs/dbt-agent-skills --skill answering-natural-language-questions-with-dbt

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.4 KB

---
name: answering-natural-language-questions-with-dbt
description: Use when a user asks a business question that requires querying data (e.g., "What were total sales last quarter?"). NOT for validating, testing, or building dbt models during development.
user-invocable: false
metadata:
  author: dbt-labs
---

# Answering Natural Language Questions with dbt

## Overview

Answer data questions using the best available method: semantic layer first, then SQL modification, then model discovery, then manifest analysis. Always exhaust options before saying "cannot answer."

**Use for:** Business questions from users that need data answers
- "What were total sales last month?"
- "How many active customers do we have?"
- "Show me revenue by region"

**Not for:**
- Validating model logic during development
- Testing dbt models or semantic layer definitions
- Building or modifying dbt models
- `dbt run`, `dbt test`, or `dbt build` workflows

## Decision Flow

```mermaid
flowchart TD
    start([Business question received])
    check_sl{Semantic layer tools available?}
    list_metrics[list_metrics]
    metric_exists{Relevant metric exists?}
    get_dims[get_dimensions]
    sl_sufficient{SL can answer directly?}
    query_metrics[query_metrics]
    answer([Return answer])
    try_compiled[get_metrics_compiled_sql<br/>Modify SQL, execute_sql]
    check_discovery{Model discovery tools available?}
    try_discovery[get_mart_models<br/>get_model_details<br/>Write SQL, execute]
    check_manifest{In dbt project?}
    try_manifest[Analyze manifest/catalog<br/>Write SQL]
    cannot([Cannot answer])
    suggest{In dbt project?}
    improvements[Suggest semantic layer changes]
    done([Done])

    start --> check_sl
    check_sl -->|yes| list_metrics
    check_sl -->|no| check_discovery
    list_metrics --> metric_exists
    metric_exists -->|yes| get_dims
    metric_exists -->|no| check_discovery
    get_dims --> sl_sufficient
    sl_sufficient -->|yes| query_metrics
    sl_sufficient -->|no| try_compiled
    query_metrics --> answer
    try_compiled -->|success| answer
    try_compiled -->|fail| check_discovery
    check_discovery -->|yes| try_discovery
    check_discovery -->|no| check_manifest
    try_discovery -->|success| answer
    try_discovery -->|fail| check_manifest
    check_manifest -->|yes| try_manifest
    check_manifest -->|no| cannot
    try_manifest -->|SQL ready| answer
    answer --> suggest
    cannot --> done
    suggest -->|yes| improvements
    suggest -->|no| done
    improvements --> done
```

## Quick Reference

| Priority | Condition | Approach | Tools |
|----------|-----------|----------|-------|
| 1 | Semantic layer active | Query metrics directly | `list_metrics`, `get_dimensions`, `query_metrics` |
| 2 | SL active but minor modifications needed (missing dimension, custom filter, case when, different aggregation) | Modify compiled SQL | `get_metrics_compiled_sql`, then `execute_sql` |
| 3 | No SL, discovery tools active | Explore models, write SQL | `get_mart_models`, `get_model_details`, then `show`/`execute_sql` |
| 4 | No MCP, in dbt project | Analyze artifacts, write SQL | Read `target/manifest.json`, `target/catalog.json` |

## Approach 1: Semantic Layer Query

When `list_metrics` and `query_metrics` are available:

1. `list_metrics` - find relevant metric
2. `get_dimensions` - verify required dimensions exist
3. `query_metrics` - execute with appropriate filters

If semantic layer can't answer directly (missing dimension, need custom logic) → go to Approach 2.

## Approach 2: Modified Compiled SQL

When semantic layer has the metric but needs minor modifications:

- Missing dimension (join + group by)
- Custom filter not available as a dimension
- Case when logic for custom categorization
- Different aggregation than what's defined

1. `get_metrics_compiled_sql` - get the SQL that would run (returns raw SQL, not Jinja)
2. Modify SQL to add what's needed
3. `execute_sql` to run the raw SQL
4. **Always suggest** updating the semantic model if the modification would be reusable

```sql
-- Example: Adding sales_rep dimension
WITH base AS (
    -- ... compiled metric logic (already resolved to table names) ...
)
SELECT base.*, reps.sales_rep_name
FROM base
JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id
GROUP BY ...

-- Example: Custom filter
SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA'

-- Example: Case when categorization
SELECT
    CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size,
    SUM(amount)
FROM (compiled_metric_sql)
GROUP BY 1
```

**Note:** The compiled SQL contains resolved table names, not `{{ ref() }}`. Work with the raw SQL as returned.

## Approach 3: Model Discovery

When no semantic layer but `get_all_models`/`get_model_details` available:

1. `get_mart_models` - start with marts, not staging
2. `get_model_details` for relevant models - understand schema
3. Write SQL using `{{ ref('model_name') }}`
4. `show --inline "..."` or `execute_sql`

**Prefer marts over staging** - marts have business logic applied.

## Approach 4: Manifest/Catalog Analysis

When in a dbt project but no MCP server:

1. Check for `target/manifest.json` and `target/catalog.json`
2. **Filter before reading** - these files can be large

```bash
# Find mart models in manifest
jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json

# Get column info from catalog
jq '.nodes["model.project_name.model_name"].columns' target/catalog.json
```

3. Write SQL based on discovered schema
4. Explain: "This SQL should run in your warehouse. I cannot execute it without database access."

## Suggesting Improvements

**When in a dbt project**, suggest semantic layer changes after answering (or when cannot answer):

| Gap | Suggestion |
|-----|------------|
| Metric doesn't exist | "Add a metric definition to your semantic model" |
| Dimension missing | "Add `dimension_name` to the dimensions list in the semantic model" |
| No semantic layer | "Consider adding a semantic layer for this data" |

**Stay at semantic layer level.** Do NOT suggest:
- Database schema changes
- ETL pipeline modifications
- "Ask your data engineering team to..."

## Rationalizations to Resist

| You're Thinking... | Reality |
|--------------------|---------|
| "Semantic layer doesn't support this exact query" | Get compiled SQL and modify it (Approach 2) |
| "No MCP tools, can't help" | Check for manifest/catalog locally |
| "User needs this quickly, skip the systematic check" | Systematic approach IS the fastest path |
| "Just write SQL, it's faster" | Semantic layer exists for a reason - use it first |
| "The dimension doesn't exist in the data" | Maybe it exists but not in semantic layer config |

## Red Flags - STOP

- Writing SQL without checking if semantic layer can answer
- Saying "cannot answer" without trying all 4 approaches
- Suggesting database-level fixes for semantic layer gaps
- Reading entire manifest.json without filtering
- Using staging models when mart models exist
- Using this to validate model correctness rather than answer business questions

## Common Mistakes

| Mistake | Fix |
|---------|-----|
| Giving up when SL can't answer directly | Get compiled SQL and modify it |
| Querying staging models | Use `get_mart_models` first |
| Reading full manifest.json | Use jq to filter |
| Suggesting ETL changes | Keep suggestions at semantic layer |
| Not checking tool availability | List available tools before choosing approach |

Overview

This skill answers business questions by querying dbt-managed data using a prioritized, safe workflow: semantic layer first, then modified compiled SQL, then model discovery, and finally manifest/catalog analysis. It is intended for responding to user data requests (e.g., total sales last quarter) and not for developing, testing, or running dbt models. The skill always exhausts options before returning that a question cannot be answered.

How this skill works

First check for a semantic layer and attempt to resolve the metric via list_metrics, get_dimensions, and query_metrics. If the semantic layer needs minor changes, retrieve compiled SQL (get_metrics_compiled_sql), adjust the raw SQL, and execute it. If no semantic layer is available, discover mart models with get_mart_models and get_model_details and build SQL referencing models. As a last resort, inspect target/manifest.json and target/catalog.json (filtered) to infer schema and craft warehouse SQL.

When to use it

Answering business questions that require querying the warehouse (sales, active users, revenue by dimension).
When a semantic layer is available and you want a metric-first approach.
When a metric exists but needs a small SQL modification (extra dimension, custom filter, different aggregation).
When no semantic layer exists but model discovery tools are available.
When in a dbt project without a server and you can read manifest/catalog artifacts.

Best practices

Always prefer the semantic layer for correctness and reuse before writing raw SQL.
If you modify compiled SQL, work with the resolved SQL (no Jinja) and suggest adding the change to the semantic model.
Prefer mart models over staging when discovering models; marts contain business logic.
Filter manifest/catalog files before reading to avoid large, slow operations.
Do not recommend database schema or ETL changes—keep suggestions at the semantic layer level.

Example use cases

User asks: 'What were total sales last month?' — query the semantic metric or run compiled SQL if needed.
User requests revenue by region with a custom segment — get compiled SQL and add CASE/WHERE/grouping.
No semantic layer: discover mart models, inspect columns, write SQL using ref('mart_model'), and execute.
In a local dbt project without a server: jq-filter manifest.json/catalog.json to find column info and produce SQL.
Metric exists but lacks a dimension: join the compiled metric to a dimension table and return grouped results.

FAQ

Can this skill modify dbt models or run dbt build/test?

No. It only reads semantic models, compiled SQL, or manifest/catalog artifacts to answer questions. It does not change models or run dbt commands.

What if the semantic layer cannot answer the question?

Follow the flow: try compiled SQL modifications, then model discovery, then manifest/catalog analysis before concluding the question cannot be answered.