home / skills / yuniorglez / gemini-elite-core / gemini-sdk-expert

gemini-sdk-expert skill

safe

This skill helps you harness Gemini SDK expertise for structured outputs, context caching, and multimodal orchestration to build reliable AI apps.

npx playbooks add skill yuniorglez/gemini-elite-core --skill gemini-sdk-expert

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

5.0 KB

---
name: gemini-sdk-expert
id: gemini-sdk-expert
version: 1.3.0
description: "Senior Architect for @google/genai v1.35.0+. Specialist in Structured Intelligence, Context Caching, and Agentic Orchestration in 2026."
---

# 🤖 Skill: gemini-sdk-expert (v1.3.0)

## Executive Summary
`gemini-sdk-expert` is a high-tier skill focused on mastering the Google Gemini ecosystem. In 2026, building with AI isn't just about prompts; it's about **Structural Integrity**, **Context Optimization**, and **Multimodal Orchestration**. This skill provides the blueprint for building ultra-reliable, cost-effective, and powerful AI applications using the latest `@google/genai` standards.

---

## 📋 Table of Contents
1. [Core Capabilities](#core-capabilities)
2. [The "Do Not" List (Anti-Patterns)](#the-do-not-list-anti-patterns)
3. [Quick Start: JSON Enforcement](#quick-start-json-enforcement)
4. [Standard Production Patterns](#standard-production-patterns)
5. [Advanced Agentic Patterns](#advanced-agentic-patterns)
6. [Context Caching Strategy](#context-caching-strategy)
7. [Multimodal Integration](#multimodal-integration)
8. [Safety & Responsible AI](#safety--responsible-ai)
9. [Reference Library](#reference-library)

---

## 🚀 Core Capabilities
- **Strict Structured Output**: Leveraging `responseSchema` for 100% reliable JSON generation.
- **Agentic Function Calling**: enabling models to interact with private APIs and tools.
- **Long-Form Context Management**: Using Context Caching for massive datasets (2M+ tokens).
- **Native Multimodal Reasoning**: Processing video, audio, and documents as first-class inputs.
- **Latency Optimization**: Strategic model selection (Flash vs. Pro) and streaming responses.

---

## 🚫 The "Do Not" List (Anti-Patterns)

| Anti-Pattern | Why it fails in 2026 | Modern Alternative |
| :--- | :--- | :--- |
| **Regex Parsing** | Fragile and prone to hallucination. | Use **`responseSchema`** (Controlled Output). |
| **Old SDK (`@google/generative-ai`)** | Outdated, lacks 2026 features. | Use **`@google/genai`** exclusively. |
| **Uncached Large Contexts** | Extremely expensive and slow. | Use **Context Caching** for repetitive queries. |
| **Hardcoded API Keys** | Security risk. | Use **Secure Environment Variables** and **`GOOGLE_GENAI_API_VERSION`**. |
| **Single-Model Bias** | Pro is overkill for simple extraction. | Use **Gemini 3 Flash** for speed/cost tasks. |

---

## ⚡ Quick Start: JSON Enforcement

The #1 rule in 2026: **Structure at the Source**.

```typescript
import { GoogleGenerativeAI, Type } from "@google/genai";

// Optional: Set API Version via env
// process.env.GOOGLE_GENAI_API_VERSION = "v1beta1";

const schema = {
  type: Type.OBJECT,
  properties: {
    status: { type: Type.STRING, enum: ["COMPLETE", "PENDING", "ERROR"] },
    summary: { type: Type.STRING },
    priority: { type: Type.NUMBER }
  },
  required: ["status", "summary"]
};

// Always set MIME type to application/json
const result = await model.generateContent({
  contents: [{ role: 'user', parts: [{ text: "Evaluate task X..." }] }],
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: schema
  }
});
```

---

## 🛠 Standard Production Patterns

### Pattern A: The Data Extractor (Flash)
Best for processing thousands of documents quickly and cheaply.
- **Model**: `gemini-3-flash`
- **Config**: High `topP`, low `temperature` for deterministic extraction.

### Pattern B: The Complex Reasoner (Pro)
Best for architectural decisions, coding assistance, and deep media analysis.
- **Model**: `gemini-3-pro`
- **Config**: Enable **Strict Mode** in schemas for 100% adherence.

---

## 🧩 Advanced Agentic Patterns

### Parallel Function Calling
Reduce round-trips by allowing the model to call multiple tools at once.
*See [References: Function Calling](./references/function-calling.md) for implementation.*

### Semantic Caching
Store and retrieve embeddings of common queries to bypass the LLM for identical requests.

---

## 💾 Context Caching Strategy

In 2026, we don't re-upload. We cache.

- **Warm-up Phase**: Initial context upload.
- **Persistence Phase**: Referencing the cache via `cachedContent`.
- **Cleanup Phase**: Managing TTLs to optimize storage costs.

*See [References: Context Caching](./references/context-caching.md) for more.*

---

## 📸 Multimodal Integration

Gemini 3 understands the world visually and audibly.

- **Video**: Scene detection and temporal reasoning.
- **Audio**: Sentiment, tone, and environment detection.
- **Document**: Visual layout and OCR.

*See [References: Multimodal Mastery](./references/multimodal-2026.md) for details.*

---

## 📖 Reference Library

Detailed deep-dives into Gemini SDK excellence:

- [**Structured Output**](./references/structured-output.md): Nested schemas and validation.
- [**Function Calling**](./references/function-calling.md): Tools, execution loops, and security.
- [**Context Caching**](./references/context-caching.md): Reducing cost and latency.
- [**Multimodal 2026**](./references/multimodal-2026.md): Video, audio, and PDF mastery.

---

*Updated: January 31, 2026 - 10:45*

Overview

This skill is a senior-architect level guide and toolkit for building high-reliability systems with the Google Gemini v1.35.0+ ecosystem. It focuses on structured outputs, context caching, and agentic orchestration to deliver predictable, cost-efficient, and multimodal AI applications. The content distills production patterns, anti-patterns, and actionable strategies engineers use in 2026.

How this skill works

The skill prescribes strict structured output using response schemas to guarantee JSON-typed responses and eliminate parsing failures. It describes context caching patterns that persist large context uploads, allowing fast retrieval and huge token-scale workflows. It also covers agentic function-calling patterns, parallel tool invocation, and model-selection guidelines for latency and cost trade-offs.

When to use it

When you need 100% reliable JSON extraction from LLM outputs for downstream systems.
When processing thousands of documents where cost and latency are critical.
When building agentic workflows that must call private APIs or tools securely.
When handling long-form or multimodal data (video, audio, documents) at scale.
When you require deterministic behavior for production decisioning or automation.

Best practices

Enforce responseSchema and set responseMimeType to application/json at the source.
Use context caching: warm-up uploads, reference cachedContent, and manage TTLs.
Choose model by role: gemini-3-flash for cost-sensitive extraction, gemini-3-pro for complex reasoning.
Avoid regex parsing of LLM text; prefer validated structured schemas instead.
Keep secrets out of code; use secure environment variables and versioned API settings.

Example use cases

Mass-extraction pipeline: parse metadata from millions of documents with gemini-3-flash and responseSchema.
Autonomous agent: allow the model to call multiple tools in parallel to complete complex multi-step tasks.
Multimodal analysis: run scene detection on video and combine with document OCR for enriched asset indexing.
Semantic caching layer: serve repeated queries from embeddings cache to reduce LLM calls and cost.
Production reasoning: use gemini-3-pro with strict schemas for architecture reviews or compliance summaries.

FAQ

How do I guarantee valid JSON from a Gemini model?

Define and enforce a responseSchema and set responseMimeType to application/json; use strict validation on the client before ingesting outputs.

When should I use context caching versus re-uploading data each request?

Use context caching for repetitive, large-context workflows (2M+ tokens). Warm-up once, reference cachedContent for subsequent runs, and apply TTLs to control storage cost.