home / skills / dagster-io / skills / integrations-index

integrations-index skill

/plugins/dagster-integrations/skills/integrations-index

This skill helps you explore Dagster integrations across AI, ETL, storage, compute, and BI, enabling quick tool selection for a use case.

npx playbooks add skill dagster-io/skills --skill integrations-index

Review the files below or copy the command above to add this skill to your agents.

Files (10)
SKILL.md
10.9 KB
---
name: integrations-index
description:
  Comprehensive index of 82+ Dagster integrations organized by official tags.yml taxonomy including
  AI (OpenAI, Anthropic), ETL (dbt, Fivetran, Airbyte, PySpark), Storage (Snowflake, BigQuery),
  Compute (AWS, Databricks, Spark), BI (Looker, Tableau), Monitoring, Alerting, and Testing. Use
  when discovering integrations or finding the right tool for a use case.
---

# Dagster Integrations Index

Navigate 82+ Dagster integrations organized by Dagster's official taxonomy. Find AI/ML tools, ETL
platforms, data storage, compute services, BI tools, and monitoring integrations.

## When to Use This Skill vs. Others

| If User Says...           | Use This Skill/Command               | Why                                      |
| ------------------------- | ------------------------------------ | ---------------------------------------- |
| "which integration for X" | `/dagster-integrations`              | Need to discover appropriate integration |
| "does dagster support X"  | `/dagster-integrations`              | Check integration availability           |
| "snowflake vs bigquery"   | `/dagster-integrations`              | Compare integrations in same category    |
| "best practices for X"    | `/dagster-conventions`               | Implementation patterns needed           |
| "implement X integration" | `/dg:prototype`                      | Ready to build with specific integration |
| "how do I use dbt"        | `/dagster-conventions` (dbt section) | dbt-specific implementation patterns     |
| "make this code better"   | `/dignified-python`                  | Python code review needed                |
| "create new project"      | `/dg:create-project`                 | Project initialization needed            |

## Quick Reference by Category

| Category               | Count | Common Tools                          | Reference                  |
| ---------------------- | ----- | ------------------------------------- | -------------------------- |
| **AI & ML**            | 6     | OpenAI, Anthropic, MLflow, W&B        | `references/ai.md`         |
| **ETL/ELT**            | 9     | dbt, Fivetran, Airbyte, PySpark       | `references/etl.md`        |
| **Storage**            | 35+   | Snowflake, BigQuery, Postgres, DuckDB | `references/storage.md`    |
| **Compute**            | 15+   | AWS, Databricks, Spark, Docker, K8s   | `references/compute.md`    |
| **BI & Visualization** | 7     | Looker, Tableau, PowerBI, Sigma       | `references/bi.md`         |
| **Monitoring**         | 3     | Datadog, Prometheus, Papertrail       | `references/monitoring.md` |
| **Alerting**           | 6     | Slack, PagerDuty, MS Teams, Twilio    | `references/alerting.md`   |
| **Testing**            | 2     | Great Expectations, Pandera           | `references/testing.md`    |
| **Other**              | 2+    | Pandas, Polars                        | `references/other.md`      |

## Category Taxonomy

This index aligns with Dagster's official documentation taxonomy from tags.yml:

- **ai**: Artificial intelligence and machine learning integrations (LLM APIs, experiment tracking)
- **etl**: Extract, transform, and load tools including data replication and transformation
  frameworks
- **storage**: Databases, data warehouses, object storage, and table formats
- **compute**: Cloud platforms, container orchestration, and distributed processing frameworks
- **bi**: Business intelligence and visualization platforms
- **monitoring**: Observability platforms and metrics systems for tracking performance
- **alerting**: Notification and incident management systems for pipeline alerts
- **testing**: Data quality validation and testing frameworks
- **other**: Miscellaneous integrations including DataFrame libraries

**Note**: Support levels (dagster-supported, community-supported) are shown inline in each
integration entry.

Last verified: 2026-01-27

## Finding the Right Integration

### I need to...

**Load data from external sources**

- SaaS applications → [ETL](#etl) (Fivetran, Airbyte)
- Files/databases → [ETL](#etl) (dlt, Sling, Meltano)
- Cloud storage → [Storage](#storage) (S3, GCS, Azure Blob)

**Transform data**

- SQL transformations → [ETL](#etl) (dbt)
- Distributed transformations → [ETL](#etl) (PySpark)
- DataFrame operations → [Other](#other) (Pandas, Polars)
- Large-scale processing → [Compute](#compute) (Spark, Dask, Ray)

**Store data**

- Cloud data warehouse → [Storage](#storage) (Snowflake, BigQuery, Redshift)
- Relational database → [Storage](#storage) (Postgres, MySQL)
- File/object storage → [Storage](#storage) (S3, GCS, Azure, LakeFS)
- Analytics database → [Storage](#storage) (DuckDB)
- Vector embeddings → [Storage](#storage) (Weaviate, Chroma, Qdrant)

**Validate data quality**

- Schema validation → [Testing](#testing) (Pandera)
- Quality checks → [Testing](#testing) (Great Expectations)

**Run ML workloads**

- LLM integration → [AI](#ai) (OpenAI, Anthropic, Gemini)
- Experiment tracking → [AI](#ai) (MLflow, W&B)
- Distributed training → [Compute](#compute) (Ray, Spark)

**Execute computation**

- Cloud compute → [Compute](#compute) (AWS, Azure, GCP, Databricks)
- Containers → [Compute](#compute) (Docker, Kubernetes)
- Distributed processing → [Compute](#compute) (Spark, Dask, Ray)

**Monitor pipelines**

- Team notifications → [Alerting](#alerting) (Slack, MS Teams, PagerDuty)
- Metrics tracking → [Monitoring](#monitoring) (Datadog, Prometheus)
- Log aggregation → [Monitoring](#monitoring) (Papertrail)

**Visualize data**

- BI dashboards → [BI](#bi) (Looker, Tableau, PowerBI)
- Analytics platform → [BI](#bi) (Sigma, Hex, Evidence)

## Integration Categories

### AI & ML

Artificial intelligence and machine learning platforms, including LLM APIs and experiment tracking.

**Key integrations:**

- **OpenAI** - GPT models and embeddings API
- **Anthropic** - Claude AI models
- **Gemini** - Google's multimodal AI
- **MLflow** - Experiment tracking and model registry
- **Weights & Biases** - ML experiment tracking
- **NotDiamond** - LLM routing and optimization

See `references/ai.md` for all AI/ML integrations.

### ETL/ELT

Extract, transform, and load tools for data ingestion, transformation, and replication.

**Key integrations:**

- **dbt** - SQL-based transformation with automatic dependencies
- **Fivetran** - Automated SaaS data ingestion (component-based)
- **Airbyte** - Open-source ELT platform
- **dlt** - Python-based data loading (component-based)
- **Sling** - High-performance data replication (component-based)
- **PySpark** - Distributed data transformation
- **Meltano** - ELT for the modern data stack

See `references/etl.md` for all ETL/ELT integrations.

### Storage

Data warehouses, databases, object storage, vector databases, and table formats.

**Key integrations:**

- **Snowflake** - Cloud data warehouse with IO managers
- **BigQuery** - Google's serverless data warehouse
- **DuckDB** - In-process SQL analytics
- **Postgres** - Open-source relational database
- **Weaviate** - Vector database for AI search
- **Delta Lake** - ACID transactions for data lakes
- **DataHub** - Metadata catalog and lineage

See `references/storage.md` for all storage integrations.

### Compute

Cloud platforms, container orchestration, and distributed processing frameworks.

**Key integrations:**

- **AWS** - Cloud compute services (Glue, EMR, Lambda)
- **Databricks** - Unified analytics platform
- **GCP** - Google Cloud compute (Dataproc, Cloud Run)
- **Spark** - Distributed data processing engine
- **Dask** - Parallel computing framework
- **Docker** - Container execution with Pipes
- **Kubernetes** - Cloud-native orchestration
- **Ray** - Distributed computing for ML

See `references/compute.md` for all compute integrations.

### BI & Visualization

Business intelligence and visualization platforms for analytics and reporting.

**Key integrations:**

- **Looker** - Google's BI platform
- **Tableau** - Interactive dashboards
- **PowerBI** - Microsoft's BI tool
- **Sigma** - Cloud analytics platform
- **Hex** - Collaborative notebooks
- **Evidence** - Markdown-based BI
- **Cube** - Semantic layer platform

See `references/bi.md` for all BI integrations.

### Monitoring

Observability platforms and metrics systems for tracking pipeline performance.

**Key integrations:**

- **Datadog** - Comprehensive observability platform
- **Prometheus** - Time-series metrics collection
- **Papertrail** - Centralized log management

See `references/monitoring.md` for all monitoring integrations.

### Alerting

Notification and incident management systems for pipeline alerts.

**Key integrations:**

- **Slack** - Team messaging and alerts
- **PagerDuty** - Incident management for on-call
- **MS Teams** - Microsoft Teams notifications
- **Twilio** - SMS and voice notifications
- **Apprise** - Universal notification platform
- **DingTalk** - Team communication for Asian markets

See `references/alerting.md` for all alerting integrations.

### Testing

Data quality validation and testing frameworks for ensuring data reliability.

**Key integrations:**

- **Great Expectations** - Data validation with expectations
- **Pandera** - Statistical data validation for DataFrames

See `references/testing.md` for all testing integrations.

### Other

Miscellaneous integrations including DataFrame libraries and utility tools.

**Key integrations:**

- **Pandas** - In-memory DataFrame library
- **Polars** - Fast DataFrame library with columnar storage

See `references/other.md` for other integrations.

## References

Integration details are organized in the following files:

- **AI & ML**: `references/ai.md` - AI and ML platforms, LLM APIs, experiment tracking
- **ETL/ELT**: `references/etl.md` - Data ingestion, transformation, and replication tools
- **Storage**: `references/storage.md` - Warehouses, databases, object storage, vector DBs
- **Compute**: `references/compute.md` - Cloud platforms, containers, distributed processing
- **BI & Visualization**: `references/bi.md` - Business intelligence and analytics platforms
- **Monitoring**: `references/monitoring.md` - Observability and metrics systems
- **Alerting**: `references/alerting.md` - Notifications and incident management
- **Testing**: `references/testing.md` - Data quality and validation frameworks
- **Other**: `references/other.md` - DataFrame libraries and miscellaneous tools

## Using Integrations

Most Dagster integrations follow a common pattern:

1. **Install the package**:

   ```bash
   pip install dagster-<integration>
   ```

2. **Import and configure a resource**:

   ```python
   from dagster_<integration> import <Integration>Resource

   resource = <Integration>Resource(
       config_param=dg.EnvVar("ENV_VAR")
   )
   ```

3. **Use in your assets**:
   ```python
   @dg.asset
   def my_asset(integration: <Integration>Resource):
       # Use the integration
       pass
   ```

For component-based integrations (dbt, Fivetran, dlt, Sling), see the specific reference files for
scaffolding and configuration patterns.

Overview

This skill provides a searchable index of 82+ Dagster integrations organized by Dagster's official tags.yml taxonomy. It helps teams discover integrations across AI/ML, ETL, storage, compute, BI, monitoring, alerting, and testing. Use it to quickly find supported tools, compare options in the same category, or verify whether Dagster has an integration for a specific vendor.

How this skill works

The skill maps integrations to Dagster taxonomy categories (ai, etl, storage, compute, bi, monitoring, alerting, testing, other) and lists common tools per category with support-level hints (dagster-supported vs community-supported). It exposes quick decision guidance for common tasks (load, transform, store, validate, monitor, alert, visualize) and links to per-category reference content containing installation and configuration patterns. Last verification date is included so users know currency.

When to use it

  • Discover what integrations Dagster supports for a specific tool or provider
  • Compare candidate integrations within the same category (e.g., Snowflake vs BigQuery)
  • Find integrations for common tasks: ingestion, transformation, storage, compute, monitoring, alerting
  • Choose integrations for AI/ML workflows (LLMs, experiment tracking)
  • Locate integrations for data quality and testing frameworks
  • Identify component-based integrations (dbt, Fivetran, Airbyte, dlt, Sling) before implementation

Best practices

  • Start from the category that matches your primary need (etl, storage, compute, etc.) to narrow options fast
  • Prefer dagster-supported packages if enterprise support and long-term maintenance matter
  • For SQL transformations, use dbt integration and follow provided scaffolding patterns
  • Use component-based integrations (Fivetran, Airbyte, dlt) when you want managed connectors and minimal wiring
  • Pair monitoring and alerting integrations (e.g., Datadog + PagerDuty or Slack) for end-to-end observability
  • Review the referenced category docs for installation, resource configuration, and example asset usage before coding

Example use cases

  • Evaluate which storage backend to use for analytics: Snowflake vs BigQuery vs DuckDB
  • Add LLM capabilities using OpenAI or Anthropic integrations for an ML pipeline
  • Wire a SaaS ingestion pipeline with Fivetran or Airbyte and orchestrate loads in Dagster
  • Configure compute execution on Databricks or Spark for large-scale transformations
  • Set up pipeline monitoring with Datadog and alerting via Slack or PagerDuty
  • Integrate Great Expectations or Pandera for automated data quality checks in assets

FAQ

Does this index show support level for each integration?

Yes — entries indicate whether an integration is dagster-supported or community-supported when available.

How do I install an integration listed here?

Most packages follow pip installs like pip install dagster-<integration>; reference files include configuration and usage patterns.