home / skills / omer-metin / skills-for-antigravity / model-optimization

model-optimization skill

/skills/model-optimization

This skill helps optimize machine learning models for size and speed, including quantization, pruning, distillation, ONNX export, and TensorRT.

npx playbooks add skill omer-metin/skills-for-antigravity --skill model-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.1 KB
---
name: model-optimization
description: Use when reducing model size, improving inference speed, or deploying to edge devices - covers quantization, pruning, knowledge distillation, ONNX export, and TensorRT optimizationUse when ", " mentioned. 
---

# Model Optimization

## Identity



## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill helps reduce model size, improve inference speed, and prepare models for edge deployment using quantization, pruning, knowledge distillation, ONNX export, and TensorRT optimization. It codifies proven patterns and risk checks so you can apply optimizations reliably. Always follow the provided reference files as the authoritative guidance for creation, diagnosis, and validation.

How this skill works

For any optimization request, the skill consults references/patterns.md to choose the correct transformation pattern (quantize, prune, distill, export, or engine build). It uses references/sharp_edges.md to surface critical failures and trade-offs, and references/validations.md to validate inputs and post-optimization constraints. The skill returns concrete commands, expected outcomes, and safety checks tailored to the model format and target device.

When to use it

  • When you need to reduce model size for mobile or embedded devices
  • When inference latency or throughput must improve without full retraining
  • When target runtime requires ONNX, TensorRT, or other engine formats
  • When regulatory or memory constraints mandate lower-precision models
  • When you want to compress a model while preserving acceptable accuracy

Best practices

  • Always consult references/patterns.md to pick the correct optimization recipe before modifying a model
  • Run the validations in references/validations.md after every transformation to ensure constraints are met
  • Use references/sharp_edges.md to evaluate risks like accuracy drop, numerical instability, or unsupported ops on the target runtime
  • Profile baseline performance and accuracy to quantify the trade-off of each optimization step
  • Prefer progressive, incremental changes (e.g., fp32→fp16→int8) and keep a reproducible rollback path

Example use cases

  • Quantize a CNN to int8 for a Raspberry Pi-like device and validate accuracy against a holdout set
  • Prune a Transformer encoder to meet a memory budget while checking detection of catastrophic forgetting
  • Distill a large classification model into a smaller student model for on-device inference
  • Export a PyTorch model to ONNX, run validations, then build a TensorRT engine for GPU servers
  • Validate an optimized model against reference inputs to catch numerical edge cases documented in sharp_edges.md

FAQ

Which reference should I consult first?

Start with references/patterns.md to choose the correct pattern, then use references/validations.md to check constraints and references/sharp_edges.md to understand risks.

What if an optimization breaks accuracy?

Use the rollback path and try a milder setting (e.g., fp16 before int8) and consult sharp_edges.md for root causes before retraining or adjusting hyperparameters.

Can I skip validations for faster iteration?

No — validations in references/validations.md are required to ensure correctness and safety for deployment.