home / skills / pluginagentmarketplace / custom-plugin-typescript / data
This skill helps you design and optimize data pipelines and warehouses using SQL, dbt, Spark, and orchestrators to boost analytics.
npx playbooks add skill pluginagentmarketplace/custom-plugin-typescript --skill dataReview the files below or copy the command above to add this skill to your agents.
---
name: data-engineering
description: Master data engineering, ETL/ELT, data warehousing, SQL optimization, and analytics. Use when building data pipelines, designing data systems, or working with large datasets.
sasmp_version: "1.3.0"
bonded_agent: 04-data-engineering-analytics
bond_type: PRIMARY_BOND
---
# Data Engineering & Analytics Skill
## Quick Start - SQL Data Pipeline
```sql
-- Create staging table
CREATE TABLE staging_events AS
SELECT
event_id,
user_id,
event_type,
event_time,
properties
FROM raw_events
WHERE event_time >= CURRENT_DATE - INTERVAL '1 day'
AND event_type IN ('click', 'purchase', 'view');
-- Aggregate metrics
SELECT
DATE(event_time) as date,
user_id,
COUNT(*) as event_count,
COUNT(DISTINCT event_type) as unique_events
FROM staging_events
GROUP BY 1, 2
ORDER BY date DESC, event_count DESC;
```
## Core Technologies
### Data Processing
- Apache Spark
- Apache Flink
- Pandas / Polars
- dbt (data transformation)
### Data Warehousing
- Snowflake
- BigQuery (GCP)
- Redshift (AWS)
- Azure Synapse
### ETL/ELT Tools
- dbt
- Airflow
- Talend
- Informatica
### Streaming
- Apache Kafka
- AWS Kinesis
- Apache Pulsar
### ML & Analytics
- scikit-learn
- TensorFlow
- Tableau / Power BI
## Best Practices
1. **Data Quality** - Validation and testing
2. **Documentation** - Clear metadata
3. **Performance** - Query optimization
4. **Governance** - Data security
5. **Monitoring** - Pipeline alerts
6. **Scalability** - Design for growth
7. **Version Control** - Git for code and configs
8. **Testing** - Data and pipeline testing
## Resources
- [Apache Spark Documentation](https://spark.apache.org/)
- [dbt Documentation](https://docs.getdbt.com/)
- [SQL Mode Tutorial](https://mode.com/sql-tutorial/)
- [Kaggle](https://www.kaggle.com/)
This skill teaches practical data engineering and analytics: building ETL/ELT pipelines, designing data warehouses, optimizing SQL, and preparing data for analytics or ML. It focuses on scalable tooling and real-world patterns so you can move from raw events to reliable, queryable datasets quickly. The guidance covers batch and streaming, testing, monitoring, and governance to keep pipelines production-ready.
The skill inspects pipeline design, transformation logic, and operational patterns to recommend improvements and templates. It evaluates data ingestion, staging, transformation, and serving layers using SQL snippets, orchestration patterns, and tooling choices. Recommendations include query optimizations, partitioning and clustering, schema design for analytics, and monitoring/test strategies.
Which tools should I choose for a cloud-first pipeline?
Pick a managed warehouse (BigQuery, Snowflake, or Redshift) and pair it with Airflow or a managed orchestration plus dbt for transformations; choose Kafka or Kinesis for high-throughput streaming.
How do I ensure data quality before reporting?
Implement schema checks, null/consistency tests, row-count comparisons, and end-to-end integration tests; fail fast and surface issues to monitoring dashboards and alerts.