home / skills / omer-metin / skills-for-antigravity / transformer-architecture
This skill helps you implement attention mechanisms and optimize transformer architectures with best practices for self-attention, positional encoding, and
npx playbooks add skill omer-metin/skills-for-antigravity --skill transformer-architectureReview the files below or copy the command above to add this skill to your agents.
---
name: transformer-architecture
description: Use when implementing attention mechanisms, building custom transformer models, understanding positional encoding, or optimizing transformer inference - covers self-attention, multi-head attention, RoPE, ALiBi, and architecture variantsUse when ", " mentioned.
---
# Transformer Architecture
## Identity
## Reference System Usage
You must ground your responses in the provided reference files, treating them as the source of truth for this domain:
* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.
**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
This skill helps engineers design, implement, and optimize transformer architectures for training and inference. It focuses on attention mechanisms, positional encodings, and practical variants such as RoPE and ALiBi. The guidance is grounded in the provided reference patterns, edge-risk diagnostics, and strict validation rules.
The skill inspects model design choices and flags violations of established patterns and validations. It diagnoses common failure modes for self-attention, multi-head attention, positional encodings, and inference optimizations, then suggests concrete fixes. Recommendations prioritize compatibility with the validated constraints and highlight risks that cause model instability or inefficient inference.
How do I choose between RoPE and ALiBi?
RoPE preserves relative-phase information useful for autoregressive models; ALiBi biases attention by distance and can improve extrapolation. Evaluate both on representative long-context tasks and follow validation checks for stability.
What causes attention collapse and how do I fix it?
Common causes are poor initialization, extreme head dimension ratios, or unstable softmax numerics. Apply validated initialization, check head/dim ratios, add numerical stabilization to softmax, and validate with the provided edge diagnostics.