home / skills / jeremylongshore / claude-code-plugins-plus-skills / spark-sql-optimizer
This skill automates spark sql optimizer guidance, producing production-ready configurations, best-practice patterns, and validation results for efficient data
npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill spark-sql-optimizerReview the files below or copy the command above to add this skill to your agents.
---
name: "spark-sql-optimizer"
description: |
Optimize spark sql optimizer operations. Auto-activating skill for Data Pipelines.
Triggers on: spark sql optimizer, spark sql optimizer
Part of the Data Pipelines skill category. Use when working with spark sql optimizer functionality. Trigger with phrases like "spark sql optimizer", "spark optimizer", "spark".
allowed-tools: "Read, Write, Edit, Bash(cmd:*), Grep"
version: 1.0.0
license: MIT
author: "Jeremy Longshore <[email protected]>"
---
# Spark Sql Optimizer
## Overview
This skill provides automated assistance for spark sql optimizer tasks within the Data Pipelines domain.
## When to Use
This skill activates automatically when you:
- Mention "spark sql optimizer" in your request
- Ask about spark sql optimizer patterns or best practices
- Need help with data pipeline skills covering etl, data transformation, workflow orchestration, and streaming data processing.
## Instructions
1. Provides step-by-step guidance for spark sql optimizer
2. Follows industry best practices and patterns
3. Generates production-ready code and configurations
4. Validates outputs against common standards
## Examples
**Example: Basic Usage**
Request: "Help me with spark sql optimizer"
Result: Provides step-by-step guidance and generates appropriate configurations
## Prerequisites
- Relevant development environment configured
- Access to necessary tools and services
- Basic understanding of data pipelines concepts
## Output
- Generated configurations and code
- Best practice recommendations
- Validation results
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Configuration invalid | Missing required fields | Check documentation for required parameters |
| Tool not found | Dependency not installed | Install required tools per prerequisites |
| Permission denied | Insufficient access | Verify credentials and permissions |
## Resources
- Official documentation for related tools
- Best practices guides
- Community examples and tutorials
## Related Skills
Part of the **Data Pipelines** skill category.
Tags: etl, airflow, spark, streaming, data-engineering
This skill automates optimization tasks for Spark SQL within data pipelines, offering actionable guidance, code snippets, and configuration templates. It auto-activates when Spark SQL optimizer topics are detected and focuses on improving query performance, resource usage, and reliability. Use it to translate best practices into production-ready changes and validation checks.
The skill inspects query plans, execution metrics, and configuration settings to identify bottlenecks such as shuffle-heavy operations, skew, and suboptimal joins. It suggests rewrite patterns, partitioning strategies, caching options, and Spark configuration tweaks, and can generate code examples and validation steps. Outputs include optimized SQL transforms, Spark job configs, and testable validation checks against common standards.
Does the skill modify my cluster automatically?
No. It provides recommended configuration and code; you apply changes in your environment after review.
What inputs does it need to analyze a job?
Provide the SQL query, EXPLAIN output or physical plan, and job metrics (task durations, shuffle sizes) for best recommendations.
Can it help with streaming jobs?
Yes. It suggests state management, watermarking, checkpointing, and micro-batch sizing for streaming SQL.