home / skills / coowoolf / insighthunt-skills / unstuck-scaling

unstuck-scaling skill

/organization-ops/unstuck-scaling

This skill helps you improve AI reliability by identifying, addressing, and quantitatively tuning specific bottlenecks for faster feedback loops.

npx playbooks add skill coowoolf/insighthunt-skills --skill unstuck-scaling

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.1 KB
---
name: unstuck-scaling
description: Use when AI agents frequently hit dead ends, when reliability is the main constraint on scaling utility, or when general model improvements don't solve specific blockers
---

# The Unstuck Scaling Framework

## Overview

A systematic approach to improving AI reliability by treating **"getting stuck"** as the primary bottleneck. Instead of broad improvements, painstakingly identify specific failure modes and create tight feedback loops.

**Core principle:** Address specific bottlenecks, not general intelligence.

## The Cycle

```
┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│     ┌───────────────────┐                                       │
│     │  IDENTIFY         │                                       │
│     │  'Stuck' Points   │                                       │
│     │  (auth, payments) │                                       │
│     └─────────┬─────────┘                                       │
│               │                                                  │
│               ▼                                                  │
│     ┌───────────────────┐                                       │
│     │  ADDRESS          │                                       │
│     │  Specific         │                                       │
│     │  Bottlenecks      │                                       │
│     └─────────┬─────────┘                                       │
│               │                                                  │
│               ▼                                                  │
│     ┌───────────────────┐                                       │
│     │  QUANTITATIVELY   │                                       │
│     │  Tune System      │                                       │
│     │  (pass/fail rate) │                                       │
│     └─────────┬─────────┘                                       │
│               │                                                  │
│               ▼                                                  │
│     ┌───────────────────┐                                       │
│     │  FAST FEEDBACK    │─────────────────────────┐             │
│     │  Loop             │                         │             │
│     └───────────────────┘                         │             │
│               ▲                                   │             │
│               └───────────────────────────────────┘             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

## Key Principles

| Principle | Description |
|-----------|-------------|
| **Specific blockers** | Identify exact points where AI fails |
| **Quantitative tuning** | Measure stuck rates, not vibes |
| **Fast feedback** | Rapid iteration on fixes |
| **Bottleneck focus** | Specific roadblocks > general intelligence |

## Common Mistakes

- Focusing on general model improvements
- Failing to measure "stuck" rates quantitatively
- Slow feedback loops preventing rapid iteration

---

*Source: Anton Osika (Lovable, GPT Engineer) via Lenny's Podcast*

Overview

This skill applies a practical framework for scaling AI systems by treating ‘getting stuck’ as the main reliability bottleneck. It helps teams find exact failure points, fix them with targeted interventions, and drive improvements with tight measurement and fast iteration. Use it when reliability limits the product’s usefulness more than raw model capability.

How this skill works

The skill inspects execution logs, user interactions, and automated tests to identify concrete stuck points (e.g., auth flows, payment validation, API timeouts). It translates those failures into pass/fail metrics, prioritizes the highest-impact blockers, and prescribes focused fixes. Finally, it establishes short feedback cycles to quantitatively tune the system and verify that changes reduce the stuck rate.

When to use it

  • When agents frequently hit dead ends or need manual intervention
  • When system reliability, not model quality, limits adoption
  • When broad model updates fail to resolve repeatable failures
  • When you need measurable improvements in task completion rates
  • When fast iteration and A/B style tuning are possible

Best practices

  • Instrument pass/fail signals for each critical flow and track stuck rates over time
  • Break down failures to the smallest identifiable step (input validation, auth, API calls)
  • Prioritize fixes by user impact and frequency, not by apparent technical elegance
  • Run rapid experiments with binary success criteria and short evaluation windows
  • Automate regression checks so fixes don’t reintroduce old stuck modes

Example use cases

  • A customer support agent that repeatedly fails to validate billing info—identify the exact step and add a lightweight rule or clarification
  • A pipeline that aborts on third-party API errors—create retry logic and clear fallback behavior, then measure pass rate improvement
  • An onboarding flow where new users drop off—instrument each screen, find the blocker, and iterate with small UX or prompt changes
  • Scaling internal assistants across teams where different workflows expose different failure modes—treat each workflow as an independent bottleneck to fix

FAQ

How is this different from general model improvement?

This skill targets specific operational bottlenecks that cause agents to stop progressing, rather than making broad model changes that may not affect the measured failure modes.

What metrics should I track first?

Start with simple pass/fail rates for core tasks, mean time to recovery for stuck sessions, and frequency of each distinct stuck point. These are actionable and easy to measure.