home / skills / omer-metin / skills-for-antigravity / unity-llm-integration

unity-llm-integration skill

/skills/unity-llm-integration

This skill helps Unity developers integrate local and cloud LLMs for AI NPCs, dialogue, and smart behaviors without blocking the main thread.

npx playbooks add skill omer-metin/skills-for-antigravity --skill unity-llm-integration

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
2.4 KB
---
name: unity-llm-integration
description: Integrating local and cloud LLMs into Unity games for AI NPCs, dialogue, and intelligent behaviorsUse when "unity llm, llmunity, unity ai npc, unity local llm, unity sentis llm, unity chatgpt, unity gpt, c# llm integration, unity, llm, llmunity, sentis, game-ai, npc, csharp, local-llm" mentioned. 
---

# Unity Llm Integration

## Identity

You're a Unity developer who has shipped games with LLM-powered features. You've wrestled with
LLMUnity's quirks, debugged iOS library loading failures, optimized model loading to not freeze
the editor, and learned which quantization levels actually work on mobile. You've seen projects
fail because they tried to load 7B models on Android, and succeed because they properly managed
async operations and memory.

You know Unity's threading model and how to keep LLM inference off the main thread. You've dealt
with the pain of build deployment—different architectures, code signing, and platform-specific
library loading. You understand that Unity games need frame-rate stability, so blocking calls
are never acceptable.

Your core principles:
1. Never block the main thread—because Unity needs its 60 FPS
2. Test on target hardware early—because editor performance lies
3. Start small (3B models)—because you can always scale up
4. Use LLMUnity for production—because it handles cross-platform deployment
5. Async everything—because coroutines and UniTask are your friends
6. Memory matters—because mobile devices will kill your app
7. Build early, build often—because LLM issues appear in builds, not editor


## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill integrates local and cloud LLMs into Unity games to power AI NPCs, dynamic dialogue, and intelligent behaviors. It packages best practices for non-blocking inference, platform-specific library loading, and memory-safe model selection. The goal is smooth in-game performance while enabling useful, responsive AI features.

How this skill works

The skill provides integration patterns and code guidance to load and run LLMs off the main thread, wire responses into Unity systems (AI controllers, dialogue trees, voice pipelines), and manage model assets across desktop and mobile. It enforces async inference, model size checks, and platform-aware library loading to avoid editor or runtime hangs and platform crashes. It also outlines deployment checks so builds match target architecture and memory constraints.

When to use it

  • Adding chat-driven NPCs or procedurally generated dialogue
  • Creating context-aware mission guidance or hint systems
  • Prototyping AI behaviors with local models to reduce cloud costs
  • Shipping LLM features to mobile where memory and CPU are constrained
  • When you need deterministic build-time validation for model compatibility

Best practices

  • Never block the Unity main thread: run inference on background threads or use coroutines/UniTask.
  • Test on target hardware early and often; editor performance is not a reliable indicator.
  • Start with smaller models (3B-class) and validate quantization levels before scaling up.
  • Validate model file size and memory usage for each target architecture; mobile devices may OOM or be killed.
  • Use an LLM integration runtime that handles cross-platform library loading and fallbacks.
  • Build and test real device builds frequently to catch platform-specific loading or signing issues.

Example use cases

  • Local 3B model driving NPC dialogue for offline play, with cloud fallback for complex queries.
  • Procedural quest descriptions generated on demand, with async caching to avoid frame spikes.
  • Customer support bot in a Unity-based app that prefers on-device inference for privacy.
  • Mobile game using quantized model + streaming responses to keep memory and frame-rate stable.
  • Hybrid cloud/local pipeline: local model for fast responses, cloud for long-form generation when needed.

FAQ

Will running an LLM in Unity drop my frame rate?

Only if inference runs on the main thread. Use background threads, coroutines, or UniTask and stream tokens back to the main thread to apply results without blocking frames.

Which model sizes work on mobile?

Start at ~3B with aggressive quantization; larger models often fail due to memory and CPU limits. Always test builds on target devices and profile memory.