home / skills / yuniorglez / gemini-elite-core / gemini-sdk-expert

gemini-sdk-expert skill

/skills/gemini-sdk-expert

This skill helps you harness Gemini SDK expertise for structured outputs, context caching, and multimodal orchestration to build reliable AI apps.

npx playbooks add skill yuniorglez/gemini-elite-core --skill gemini-sdk-expert

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
5.0 KB
---
name: gemini-sdk-expert
id: gemini-sdk-expert
version: 1.3.0
description: "Senior Architect for @google/genai v1.35.0+. Specialist in Structured Intelligence, Context Caching, and Agentic Orchestration in 2026."
---

# 🤖 Skill: gemini-sdk-expert (v1.3.0)

## Executive Summary
`gemini-sdk-expert` is a high-tier skill focused on mastering the Google Gemini ecosystem. In 2026, building with AI isn't just about prompts; it's about **Structural Integrity**, **Context Optimization**, and **Multimodal Orchestration**. This skill provides the blueprint for building ultra-reliable, cost-effective, and powerful AI applications using the latest `@google/genai` standards.

---

## 📋 Table of Contents
1. [Core Capabilities](#core-capabilities)
2. [The "Do Not" List (Anti-Patterns)](#the-do-not-list-anti-patterns)
3. [Quick Start: JSON Enforcement](#quick-start-json-enforcement)
4. [Standard Production Patterns](#standard-production-patterns)
5. [Advanced Agentic Patterns](#advanced-agentic-patterns)
6. [Context Caching Strategy](#context-caching-strategy)
7. [Multimodal Integration](#multimodal-integration)
8. [Safety & Responsible AI](#safety--responsible-ai)
9. [Reference Library](#reference-library)

---

## 🚀 Core Capabilities
- **Strict Structured Output**: Leveraging `responseSchema` for 100% reliable JSON generation.
- **Agentic Function Calling**: enabling models to interact with private APIs and tools.
- **Long-Form Context Management**: Using Context Caching for massive datasets (2M+ tokens).
- **Native Multimodal Reasoning**: Processing video, audio, and documents as first-class inputs.
- **Latency Optimization**: Strategic model selection (Flash vs. Pro) and streaming responses.

---

## 🚫 The "Do Not" List (Anti-Patterns)

| Anti-Pattern | Why it fails in 2026 | Modern Alternative |
| :--- | :--- | :--- |
| **Regex Parsing** | Fragile and prone to hallucination. | Use **`responseSchema`** (Controlled Output). |
| **Old SDK (`@google/generative-ai`)** | Outdated, lacks 2026 features. | Use **`@google/genai`** exclusively. |
| **Uncached Large Contexts** | Extremely expensive and slow. | Use **Context Caching** for repetitive queries. |
| **Hardcoded API Keys** | Security risk. | Use **Secure Environment Variables** and **`GOOGLE_GENAI_API_VERSION`**. |
| **Single-Model Bias** | Pro is overkill for simple extraction. | Use **Gemini 3 Flash** for speed/cost tasks. |

---

## âš¡ Quick Start: JSON Enforcement

The #1 rule in 2026: **Structure at the Source**.

```typescript
import { GoogleGenerativeAI, Type } from "@google/genai";

// Optional: Set API Version via env
// process.env.GOOGLE_GENAI_API_VERSION = "v1beta1";

const schema = {
  type: Type.OBJECT,
  properties: {
    status: { type: Type.STRING, enum: ["COMPLETE", "PENDING", "ERROR"] },
    summary: { type: Type.STRING },
    priority: { type: Type.NUMBER }
  },
  required: ["status", "summary"]
};

// Always set MIME type to application/json
const result = await model.generateContent({
  contents: [{ role: 'user', parts: [{ text: "Evaluate task X..." }] }],
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: schema
  }
});
```

---

## 🛠 Standard Production Patterns

### Pattern A: The Data Extractor (Flash)
Best for processing thousands of documents quickly and cheaply.
- **Model**: `gemini-3-flash`
- **Config**: High `topP`, low `temperature` for deterministic extraction.

### Pattern B: The Complex Reasoner (Pro)
Best for architectural decisions, coding assistance, and deep media analysis.
- **Model**: `gemini-3-pro`
- **Config**: Enable **Strict Mode** in schemas for 100% adherence.

---

## 🧩 Advanced Agentic Patterns

### Parallel Function Calling
Reduce round-trips by allowing the model to call multiple tools at once.
*See [References: Function Calling](./references/function-calling.md) for implementation.*

### Semantic Caching
Store and retrieve embeddings of common queries to bypass the LLM for identical requests.

---

## 💾 Context Caching Strategy

In 2026, we don't re-upload. We cache.

- **Warm-up Phase**: Initial context upload.
- **Persistence Phase**: Referencing the cache via `cachedContent`.
- **Cleanup Phase**: Managing TTLs to optimize storage costs.

*See [References: Context Caching](./references/context-caching.md) for more.*

---

## 📸 Multimodal Integration

Gemini 3 understands the world visually and audibly.

- **Video**: Scene detection and temporal reasoning.
- **Audio**: Sentiment, tone, and environment detection.
- **Document**: Visual layout and OCR.

*See [References: Multimodal Mastery](./references/multimodal-2026.md) for details.*

---

## 📖 Reference Library

Detailed deep-dives into Gemini SDK excellence:

- [**Structured Output**](./references/structured-output.md): Nested schemas and validation.
- [**Function Calling**](./references/function-calling.md): Tools, execution loops, and security.
- [**Context Caching**](./references/context-caching.md): Reducing cost and latency.
- [**Multimodal 2026**](./references/multimodal-2026.md): Video, audio, and PDF mastery.

---

*Updated: January 31, 2026 - 10:45*

Overview

This skill is a senior-architect level guide and toolkit for building high-reliability systems with the Google Gemini v1.35.0+ ecosystem. It focuses on structured outputs, context caching, and agentic orchestration to deliver predictable, cost-efficient, and multimodal AI applications. The content distills production patterns, anti-patterns, and actionable strategies engineers use in 2026.

How this skill works

The skill prescribes strict structured output using response schemas to guarantee JSON-typed responses and eliminate parsing failures. It describes context caching patterns that persist large context uploads, allowing fast retrieval and huge token-scale workflows. It also covers agentic function-calling patterns, parallel tool invocation, and model-selection guidelines for latency and cost trade-offs.

When to use it

  • When you need 100% reliable JSON extraction from LLM outputs for downstream systems.
  • When processing thousands of documents where cost and latency are critical.
  • When building agentic workflows that must call private APIs or tools securely.
  • When handling long-form or multimodal data (video, audio, documents) at scale.
  • When you require deterministic behavior for production decisioning or automation.

Best practices

  • Enforce responseSchema and set responseMimeType to application/json at the source.
  • Use context caching: warm-up uploads, reference cachedContent, and manage TTLs.
  • Choose model by role: gemini-3-flash for cost-sensitive extraction, gemini-3-pro for complex reasoning.
  • Avoid regex parsing of LLM text; prefer validated structured schemas instead.
  • Keep secrets out of code; use secure environment variables and versioned API settings.

Example use cases

  • Mass-extraction pipeline: parse metadata from millions of documents with gemini-3-flash and responseSchema.
  • Autonomous agent: allow the model to call multiple tools in parallel to complete complex multi-step tasks.
  • Multimodal analysis: run scene detection on video and combine with document OCR for enriched asset indexing.
  • Semantic caching layer: serve repeated queries from embeddings cache to reduce LLM calls and cost.
  • Production reasoning: use gemini-3-pro with strict schemas for architecture reviews or compliance summaries.

FAQ

How do I guarantee valid JSON from a Gemini model?

Define and enforce a responseSchema and set responseMimeType to application/json; use strict validation on the client before ingesting outputs.

When should I use context caching versus re-uploading data each request?

Use context caching for repetitive, large-context workflows (2M+ tokens). Warm-up once, reference cachedContent for subsequent runs, and apply TTLs to control storage cost.