home / skills / jrajasekera / claude-skills / z-ai-api

z-ai-api skill

/z-ai-api

This skill simplifies integrating Z.ai APIs for chat, vision, image, video, and translation tasks, delivering streamlined prompts and tool usage.

npx playbooks add skill jrajasekera/claude-skills --skill z-ai-api

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
5.7 KB
---
name: z-ai-api
description: |
  Z.ai API integration for building applications with GLM models. Use when working with Z.ai/ZhipuAI APIs for: (1) Chat completions with GLM-4.7/4.6/4.5 models, (2) Vision/multimodal tasks with GLM-4.6V, (3) Image generation with GLM-Image or CogView-4, (4) Video generation with CogVideoX-3 or Vidu models, (5) Audio transcription with GLM-ASR-2512, (6) Function calling and tool use, (7) Web search integration, (8) Translation, slide/poster generation agents. Triggers: Z.ai, ZhipuAI, GLM, BigModel, Zhipu, CogVideoX, CogView, Vidu.
---

# Z.ai API Skill

## Quick Reference

**Base URL:** `https://api.z.ai/api/paas/v4`
**Coding Plan URL:** `https://api.z.ai/api/coding/paas/v4`
**Auth:** `Authorization: Bearer YOUR_API_KEY`

## Core Endpoints

| Endpoint | Purpose |
|----------|---------|
| `/chat/completions` | Text/vision chat |
| `/images/generations` | Image generation |
| `/videos/generations` | Video generation (async) |
| `/audio/transcriptions` | Speech-to-text |
| `/web_search` | Web search |
| `/async-result/{id}` | Poll async tasks |
| `/v1/agents` | Translation, slides, effects |

## Model Selection

**Chat (pick by need):**
- `glm-4.7` — Latest flagship, best quality, agentic coding
- `glm-4.7-flash` — Fast, high quality
- `glm-4.6` — Reliable general use
- `glm-4.5-flash` — Fastest, lower cost

**Vision:**
- `glm-4.6v` — Best multimodal (images, video, files)
- `glm-4.6v-flash` — Fast vision

**Media:**
- `glm-image` — High-quality images (HD, ~20s)
- `cogview-4-250304` — Fast images (~5-10s)
- `cogvideox-3` — Video, up to 4K, 5-10s
- `viduq1-text/image` — Vidu video generation

## Implementation Patterns

### Basic Chat
```python
from zai import ZaiClient

client = ZaiClient(api_key="YOUR_KEY")

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)
```

### OpenAI SDK Compatibility
```python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_ZAI_KEY",
    base_url="https://api.z.ai/api/paas/v4/"
)
# Use exactly like OpenAI SDK
```

### Streaming
```python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    stream=True
)
for chunk in response:
    print(chunk.choices[0].delta.content, end="")
```

### Function Calling
```python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

# Handle tool_calls in response.choices[0].message.tool_calls
```

### Vision (Images/Video/Files)
```python
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://..."}},
            {"type": "text", "text": "Describe this image"}
        ]
    }]
)
```

### Image Generation
```python
response = client.images.generate(
    model="glm-image",
    prompt="A serene mountain at sunset",
    size="1280x1280",
    quality="hd"
)
print(response.data[0].url)  # Expires in 30 days
```

### Video Generation (Async)
```python
# Submit
response = client.videos.generate(
    model="cogvideox-3",
    prompt="A cat playing with yarn",
    size="1920x1080",
    duration=5
)
task_id = response.id

# Poll for result
import time
while True:
    result = client.async_result.get(task_id)
    if result.task_status == "SUCCESS":
        print(result.video_result[0].url)
        break
    time.sleep(5)
```

### Web Search Integration
```python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Latest AI news?"}],
    tools=[{
        "type": "web_search",
        "web_search": {
            "enable": True,
            "search_result": True
        }
    }]
)
# Access response.web_search for sources
```

### Thinking Mode (Chain-of-Thought)
```python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    thinking={"type": "enabled"},
    stream=True  # Recommended with thinking
)
# Access reasoning_content in response
```

## Key Parameters

| Parameter | Values | Notes |
|-----------|--------|-------|
| `temperature` | 0.0-1.0 | GLM-4.7: 1.0, GLM-4.5: 0.6 default |
| `top_p` | 0.01-1.0 | Default ~0.95 |
| `max_tokens` | varies | GLM-4.7: 128K, GLM-4.5: 96K max |
| `stream` | bool | Enable SSE streaming |
| `response_format` | `{"type": "json_object"}` | Force JSON output |

## Error Handling

- **429**: Rate limited — implement exponential backoff
- **401**: Bad API key — verify credentials
- **sensitive**: Content filtered — modify input

```python
if response.choices[0].finish_reason == "tool_calls":
    # Execute function and continue conversation
elif response.choices[0].finish_reason == "length":
    # Increase max_tokens or truncate
elif response.choices[0].finish_reason == "sensitive":
    # Content was filtered
```

## Reference Files

For detailed API specifications, consult:
- `references/chat-completions.md` — Full chat API, parameters, models
- `references/tools-and-functions.md` — Function calling, web search, retrieval
- `references/media-generation.md` — Image, video, audio APIs
- `references/agents.md` — Translation, slides, effects agents
- `references/error-codes.md` — Error handling, rate limits

Overview

This skill integrates the Z.ai (ZhipuAI) API to build chat, multimodal, media, and agent workflows using GLM family models. It exposes endpoints for chat completions, vision tasks, image/video/audio generation and transcription, web search, and function/tool calling. Use it to prototype or productize features that require high-quality language, vision, and media generation capabilities.

How this skill works

The skill calls Z.ai REST endpoints (base URL https://api.z.ai/api/paas/v4) with an Authorization bearer key to request chat completions, image/video/audio generation, or asynchronous media tasks. It supports model selection (glm-4.7, glm-4.6v, glm-image, cogvideox-3, etc.), streaming responses, function/tool invocation, web search integration, and polling of async results. Responses include content, tool_call metadata, and media URLs that may be time-limited.

When to use it

  • Build chatbots or assistants that need advanced reasoning and chain-of-thought (glm-4.7).
  • Add multimodal understanding (image/video inputs) or captioning using glm-4.6v.
  • Generate high-quality images or fast thumbnails with glm-image or cogview variants.
  • Create videos asynchronously (cogvideox-3, Vidu) and poll async-result for completion.
  • Transcribe audio with GLM-ASR models and integrate web search or function/tool calls for grounded answers.
  • Implement function calling and tool orchestration for agents (translation, slides, effects).

Best practices

  • Choose models by cost/latency tradeoffs: use flash variants for speed, flagship models for quality.
  • Use streaming for long outputs or thinking/chain-of-thought to reduce perceived latency.
  • Implement exponential backoff and retries for 429 rate limits and handle 401 for credential issues.
  • Use response_format and max_tokens consciously; enable JSON forcing when requiring structured outputs.
  • For async media (video), submit tasks then poll /async-result/{id} and handle transient states.
  • Validate and sanitize user inputs before sending to avoid content filtering or sensitive triggers.

Example use cases

  • Customer support assistant using glm-4.7 with function calling to query backend systems.
  • Image captioning and multimodal Q&A using glm-4.6v with image_url inputs.
  • Batch image generation for marketing assets with glm-image and size/quality options.
  • Automated short video generation pipeline: submit to cogvideox-3, poll async-result, store result URLs.
  • Audio meeting transcription using GLM-ASR followed by summarization via chat completions.

FAQ

How do I poll for video generation results?

Submit the generation request, capture the returned task id, and poll the /async-result/{id} endpoint until task_status == "SUCCESS". Backoff between polls to avoid rate limits.

Can I use OpenAI-compatible SDKs with this API?

Yes. The API supports OpenAI-compatible calls by setting the base_url to https://api.z.ai/api/paas/v4 and passing your Z.ai API key, allowing many OpenAI SDKs to work with minimal changes.