home / skills / pluginagentmarketplace / custom-plugin-data-analyst / advanced

advanced skill

/skills/advanced

This skill helps you build, validate, and deploy advanced machine learning and predictive analytics using big data techniques across cloud platforms.

npx playbooks add skill pluginagentmarketplace/custom-plugin-data-analyst --skill advanced

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
2.7 KB
---
name: advanced-analytics
description: Advanced analytics including machine learning, predictive modeling, and big data techniques
version: "2.0.0"
sasmp_version: "2.0.0"
bonded_agent: 06-advanced-analytics-specialist
bond_type: PRIMARY_BOND

# Skill Configuration
config:
  atomic: true
  retry_enabled: true
  max_retries: 3
  backoff_strategy: exponential
  model_training_timeout: 3600

# Parameter Validation
parameters:
  skill_level:
    type: string
    required: true
    enum: [intermediate, advanced, expert]
    default: intermediate
  focus_area:
    type: string
    required: false
    enum: [regression, classification, clustering, timeseries, feature_engineering, all]
    default: all
  deployment_target:
    type: string
    required: false
    enum: [notebook, api, batch, realtime]
    default: notebook

# Observability
observability:
  logging_level: info
  metrics: [model_accuracy, training_time, prediction_latency, feature_importance]
  model_versioning: true
---

# Advanced Analytics Skill

## Overview
Master advanced analytics techniques including machine learning, predictive modeling, and big data processing for sophisticated data analysis.

## Core Topics

### Machine Learning Fundamentals
- Supervised vs unsupervised learning
- Classification algorithms (logistic regression, decision trees, random forest)
- Regression algorithms (linear, polynomial, ensemble methods)
- Clustering (K-means, hierarchical, DBSCAN)

### Predictive Analytics
- Time series forecasting (ARIMA, exponential smoothing)
- Customer segmentation and RFM analysis
- Churn prediction models
- A/B testing and experimentation

### Big Data Technologies
- Introduction to Spark and PySpark
- Data lakes and data mesh concepts
- Cloud analytics platforms (AWS, GCP, Azure)
- Real-time analytics with streaming data

### Advanced Techniques
- Feature engineering best practices
- Model validation and cross-validation
- Hyperparameter tuning
- Model deployment considerations

## Learning Objectives
- Build and validate machine learning models
- Implement predictive analytics solutions
- Work with big data technologies
- Apply advanced statistical techniques

## Error Handling

| Error Type | Cause | Recovery |
|------------|-------|----------|
| Overfitting | Model too complex | Add regularization, reduce features |
| Underfitting | Model too simple | Add features, increase complexity |
| Data leakage | Target info in features | Review feature engineering pipeline |
| Class imbalance | Skewed target | Use SMOTE, class weights, or resampling |
| Convergence failure | Poor hyperparameters | Grid search, adjust learning rate |

## Related Skills
- statistics (for foundational statistical knowledge)
- programming (for ML implementation)
- databases-sql (for big data querying)

Overview

This skill teaches advanced analytics techniques including machine learning, predictive modeling, and big data processing to solve practical data problems. It focuses on building, validating, and deploying models while handling real-world data challenges. The material combines statistical rigor with scalable tools for production analytics.

How this skill works

The skill inspects datasets, guides feature engineering, and applies supervised and unsupervised algorithms for classification, regression, and clustering. It includes time series forecasting and experiment design for predictive use cases. For large-scale needs it demonstrates Spark/PySpark workflows and cloud analytics patterns, plus model validation and deployment best practices.

When to use it

  • Building classification or regression models for business decisions
  • Designing predictive pipelines such as churn, demand, or sales forecasting
  • Segmenting customers using clustering and RFM analysis
  • Scaling analytics to big data with Spark or cloud platforms
  • Running A/B tests and interpreting experiment results

Best practices

  • Start with clear problem framing and evaluation metrics tied to business outcomes
  • Prioritize robust feature engineering and pipeline reproducibility
  • Use cross-validation and proper train/validation/test splits to avoid data leakage
  • Address class imbalance and overfitting with sampling, weighting, and regularization
  • Automate hyperparameter tuning and monitor models after deployment

Example use cases

  • Customer churn prediction using ensemble models and RFM features
  • Sales forecasting with ARIMA or exponential smoothing for inventory planning
  • Product recommendation prototypes using clustering and supervised ranking
  • Real-time anomaly detection with streaming data on Spark or cloud services
  • A/B test analysis to measure lift and guide product decisions

FAQ

What data size is required to apply these techniques?

Techniques scale from small datasets to big data. Start with sample data for prototyping; use Spark or cloud tools when datasets exceed single-machine capacity.

How do I prevent data leakage?

Isolate future information from training features, use time-based splits for time series, and validate pipelines end-to-end before model selection.