home / skills / cnemri / google-genai-skills / google-genai-sdk-python

google-genai-sdk-python skill

/skills/google-genai-sdk-python

This skill helps you write idiomatic Python code using the google-genai SDK for Gemini API and Vertex AI, enabling efficient text, multimodal, and

npx playbooks add skill cnemri/google-genai-skills --skill google-genai-sdk-python

Review the files below or copy the command above to add this skill to your agents.

Files (11)
SKILL.md
1.6 KB
---
name: google-genai-sdk-python
description: Expert guidance for writing Python code using the official Google GenAI SDK (google-genai) for Gemini API and Vertex AI. Use for text generation, multimodal inputs, reasoning, tools, and media generation.
---

# Google GenAI Python SDK Skill

Use this skill to write high-quality, idiomatic Python code for the Gemini API.

## Reference Materials

Identify the user's task and refer to the relevant file:

- **[Setup & Client](references/setup.md)**: Installation, auth, client initialization.
- **[Models](references/models.md)**: Recommended models (Flash, Pro, Lite, Imagen, Veo).
- **[Text Generation](references/text_generation.md)**: Basic inference, streaming, system instructions, safety.
- **[Chat](references/chat.md)**: Multi-turn conversations and history.
- **[Reasoning](references/reasoning.md)**: Thinking config (`thinking_level` / `thinking_budget`), thought signatures.
- **[Structured Output](references/structured_output.md)**: JSON schemas, Pydantic models, Enums.
- **[Multimodal Inputs](references/multimodal_inputs.md)**: Images, audio, video, PDFs, media resolution.
- **[Tools](references/tools.md)**: Function calling, code execution, Google Search grounding.
- **[Media Generation](references/media_generation.md)**: Image generation/editing (Imagen), video generation (Veo).
- **[Source Code](references/source_code.md)**: Raw SDK source code for deep inspection.

## Core Principles

1.  **Unified SDK**: Always use `google-genai`.
2.  **Stateless Models**: Use `client.models` for single requests.
3.  **Stateful Chats**: Use `client.chats` for conversations.
4.  **Types**: Import from `google.genai.types`.

Overview

This skill provides expert guidance for writing idiomatic Python code with the official google-genai SDK to interact with Gemini models and Vertex AI. It focuses on common developer tasks: setup, model selection, text and multimodal generation, reasoning, structured outputs, tools, and media generation. Use it to produce robust examples, patterns, and practical tips that map directly to the SDK surface.

How this skill works

I inspect the user's intent and recommend concrete code patterns using google-genai client APIs: client.models for one-off inference and client.chats for stateful conversations. I select appropriate model families (Flash, Pro, Lite, Imagen, Veo) and show how to configure prompts, streaming, thinking settings, structured output schemas, and multimodal inputs. I also provide examples for function-like tools, code execution, Google Search grounding, and media generation workflows.

When to use it

  • Building single-request text generation or streaming responses with Gemini.
  • Creating multi-turn chat experiences that require conversation state or memory.
  • Generating or editing images and videos using Imagen and Veo models.
  • Parsing model outputs into typed JSON or Pydantic models for reliability.
  • Combining model reasoning with external tools like search or code execution.

Best practices

  • Always initialize and reuse a single google-genai client per application process.
  • Choose models by capability: Lite for low-cost tasks, Flash/Pro for robust reasoning, Imagen/Veo for media.
  • Prefer stateless client.models requests for isolated inferences and client.chats for conversational flows.
  • Use structured output schemas (JSON schema or Pydantic) to validate and parse responses deterministically.
  • Employ thinking config (thinking_level and thinking_budget) only when you need deeper chain-of-thought or constrained reasoning.

Example use cases

  • A web API that streams token-by-token assistant responses for a chat UI using client.models streaming.
  • A conversational agent that maintains context, invokes tools, and grounds facts with Google Search via client.chats and function-like tool calls.
  • An image editing pipeline that sends a reference image and edit instructions to Imagen, then validates the returned asset.
  • A data-extraction routine that requests structured JSON output and validates it with Pydantic models before ingestion.
  • A multimodal assistant that accepts images, audio, or PDFs as inputs and synthesizes a summarized report.

FAQ

Should I use client.models or client.chats?

Use client.models for single, stateless requests and client.chats for multi-turn conversations where you must maintain history or agent state.

How do I ensure deterministic structured outputs?

Define a strict JSON schema or Pydantic model and request the model to emit that schema. Validate and re-request when the output fails validation.