home / skills / jezweb / claude-skills / google-gemini-file-search
/skills/google-gemini-file-search
This skill helps you set up and operate Google Gemini File Search for document Q&A with automatic chunking, embeddings, and citations.
npx playbooks add skill jezweb/claude-skills --skill google-gemini-file-searchReview the files below or copy the command above to add this skill to your agents.
---
name: google-gemini-file-search
description: |
Build document Q&A with Gemini File Search - fully managed RAG with automatic chunking, embeddings, and citations. Upload 100+ file formats, query with natural language.
Use when: document Q&A, searchable knowledge bases, semantic search. Troubleshoot: document immutability, storage quota (3x), chunking config, metadata limits (20 max), polling timeouts, displayName dropped (Blob uploads), grounding lost (JSON mode), tool conflicts (googleSearch + fileSearch).
user-invocable: true
allowed-tools:
- Bash
- Read
- Write
- Glob
- Grep
- WebFetch
---
# Google Gemini File Search Setup
## Overview
Google Gemini File Search is a fully managed RAG system. Upload documents (100+ formats: PDF, Word, Excel, code) and query with natural language—automatic chunking, embeddings, semantic search, and citations.
**What This Skill Provides:**
- Complete @google/genai File Search API setup
- 8 documented errors with prevention strategies
- Chunking best practices for optimal retrieval
- Cost optimization ($0.15/1M tokens indexing, 3x storage multiplier)
- Cloudflare Workers + Next.js integration templates
## Prerequisites
### 1. Google AI API Key
Create an API key at https://aistudio.google.com/apikey
**Free Tier Limits:**
- 1 GB storage (total across all file search stores)
- 1,500 requests per day
- 1 million tokens per minute
**Paid Tier Pricing:**
- Indexing: $0.15 per 1M input tokens (one-time)
- Storage: Free (Tier 1: 10 GB, Tier 2: 100 GB, Tier 3: 1 TB)
- Query-time embeddings: Free (retrieved context counts as input tokens)
### 2. Node.js Environment
**Minimum Version:** Node.js 18+ (v20+ recommended)
```bash
node --version # Should be >=18.0.0
```
### 3. Install @google/genai SDK
```bash
npm install @google/genai
# or
pnpm add @google/genai
# or
yarn add @google/genai
```
**Current Stable Version:** 1.30.0+ (verify with `npm view @google/genai version`)
**⚠️ Important:** File Search API requires **@google/genai v1.29.0 or later**. Earlier versions do not support File Search. The API was added in v1.29.0 (November 5, 2025).
### 4. TypeScript Configuration (Optional but Recommended)
```json
{
"compilerOptions": {
"target": "ES2020",
"module": "ESNext",
"moduleResolution": "node",
"esModuleInterop": true,
"strict": true,
"skipLibCheck": true
}
}
```
## Common Errors Prevented
This skill prevents 12 common errors encountered when implementing File Search:
### Error 1: Document Immutability
**Symptom:**
```
Error: Documents cannot be modified after indexing
```
**Cause:** Documents are immutable once indexed. There is no PATCH or UPDATE operation.
**Prevention:**
Use the delete+re-upload pattern for updates:
```typescript
// ❌ WRONG: Trying to update document (no such API)
await ai.fileSearchStores.documents.update({
name: documentName,
customMetadata: { version: '2.0' }
})
// ✅ CORRECT: Delete then re-upload
const docs = await ai.fileSearchStores.documents.list({
parent: fileStore.name
})
const oldDoc = docs.documents.find(d => d.displayName === 'manual.pdf')
if (oldDoc) {
await ai.fileSearchStores.documents.delete({
name: oldDoc.name,
force: true
})
}
await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('manual-v2.pdf'),
config: { displayName: 'manual.pdf' }
})
```
**Source:** https://ai.google.dev/api/file-search/documents
### Error 2: Storage Quota Exceeded
**Symptom:**
```
Error: Quota exceeded. Expected 1GB limit, but 3.2GB used.
```
**Cause:** Storage calculation includes input files + embeddings + metadata. Total storage ≈ 3x input size.
**Prevention:**
Calculate storage before upload:
```typescript
// ❌ WRONG: Assuming storage = file size
const fileSize = fs.statSync('data.pdf').size // 500 MB
// Expect 500 MB usage → WRONG
// ✅ CORRECT: Account for 3x multiplier
const fileSize = fs.statSync('data.pdf').size // 500 MB
const estimatedStorage = fileSize * 3 // 1.5 GB (embeddings + metadata)
console.log(`Estimated storage: ${estimatedStorage / 1e9} GB`)
// Check if within quota before upload
if (estimatedStorage > 1e9) {
console.warn('⚠️ File may exceed free tier 1 GB limit')
}
```
**Source:** https://blog.google/technology/developers/file-search-gemini-api/
### Error 3: Incorrect Chunking Configuration
**Symptom:**
Poor retrieval quality, irrelevant results, or context cutoff mid-sentence.
**Cause:** Default chunking may not be optimal for your content type.
**Prevention:**
Use recommended chunking strategy:
```typescript
// ❌ WRONG: Using defaults without testing
await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('docs.pdf')
// Default chunking may be too large or too small
})
// ✅ CORRECT: Configure chunking for precision
await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('docs.pdf'),
config: {
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 500, // Smaller chunks = more precise retrieval
maxOverlapTokens: 50 // 10% overlap prevents context loss
}
}
}
})
```
**Chunking Guidelines:**
- **Technical docs/code:** 500 tokens/chunk, 50 overlap
- **Prose/articles:** 800 tokens/chunk, 80 overlap
- **Legal/contracts:** 300 tokens/chunk, 30 overlap (high precision)
**Source:** https://www.philschmid.de/gemini-file-search-javascript
### Error 4: Metadata Limits Exceeded
**Symptom:**
```
Error: Maximum 20 custom metadata key-value pairs allowed
```
**Cause:** Each document can have at most 20 metadata fields.
**Prevention:**
Design compact metadata schema:
```typescript
// ❌ WRONG: Too many metadata fields
await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('doc.pdf'),
config: {
customMetadata: {
doc_type: 'manual',
version: '1.0',
author: 'John Doe',
department: 'Engineering',
created_date: '2025-01-01',
// ... 18 more fields → Error!
}
}
})
// ✅ CORRECT: Use hierarchical keys or JSON strings
await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('doc.pdf'),
config: {
customMetadata: {
doc_type: 'manual',
version: '1.0',
author_dept: 'John Doe|Engineering', // Combine related fields
dates: JSON.stringify({ // Or use JSON for complex data
created: '2025-01-01',
updated: '2025-01-15'
})
}
}
})
```
**Source:** https://ai.google.dev/api/file-search/documents
### Error 5: Indexing Cost Surprises
**Symptom:**
Unexpected bill for $375 after uploading 10 GB of documents.
**Cause:** Indexing costs are one-time but calculated per input token ($0.15/1M tokens).
**Prevention:**
Estimate costs before indexing:
```typescript
// ❌ WRONG: No cost estimation
await uploadAllDocuments(fileStore.name, './data') // 10 GB uploaded → $375 surprise
// ✅ CORRECT: Calculate costs upfront
const totalSize = getTotalDirectorySize('./data') // 10 GB
const estimatedTokens = (totalSize / 4) // Rough estimate: 1 token ≈ 4 bytes
const indexingCost = (estimatedTokens / 1e6) * 0.15
console.log(`Estimated indexing cost: $${indexingCost.toFixed(2)}`)
console.log(`Estimated storage: ${(totalSize * 3) / 1e9} GB`)
// Confirm before proceeding
const proceed = await confirm(`Proceed with indexing? Cost: $${indexingCost.toFixed(2)}`)
if (proceed) {
await uploadAllDocuments(fileStore.name, './data')
}
```
**Cost Examples:**
- 1 GB text ≈ 250M tokens = $37.50 indexing
- 100 MB PDF ≈ 25M tokens = $3.75 indexing
- 10 MB code ≈ 2.5M tokens = $0.38 indexing
**Source:** https://ai.google.dev/pricing
### Error 6: Not Polling Operation Status
**Symptom:**
Query returns no results immediately after upload, or incomplete indexing.
**Cause:** File uploads are processed asynchronously. Must poll operation until `done: true`.
**Prevention:**
Always poll operation status with timeout and fallback:
```typescript
// ❌ WRONG: Assuming upload is instant
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('large.pdf')
})
// Immediately query → No results!
// ✅ CORRECT: Poll until indexing complete with timeout
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('large.pdf')
})
// Poll with timeout and fallback
const MAX_POLL_TIME = 60000 // 60 seconds
const POLL_INTERVAL = 1000
let elapsed = 0
while (!operation.done && elapsed < MAX_POLL_TIME) {
await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL))
elapsed += POLL_INTERVAL
try {
operation = await ai.operations.get({ name: operation.name })
console.log(`Indexing progress: ${operation.metadata?.progress || 'processing...'}`)
} catch (error) {
console.warn('Polling failed, assuming complete:', error)
break
}
}
if (operation.error) {
throw new Error(`Indexing failed: ${operation.error.message}`)
}
// ⚠️ Warning: operations.get() can be unreliable for large files
// If timeout reached, verify document exists manually
if (elapsed >= MAX_POLL_TIME) {
console.warn('Polling timeout - verifying document manually')
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
const uploaded = docs.documents?.find(d => d.displayName === 'large.pdf')
if (uploaded) {
console.log('✅ Document found despite polling timeout')
} else {
throw new Error('Upload failed - document not found')
}
}
console.log('✅ Indexing complete:', operation.response?.displayName)
```
**Source:** https://ai.google.dev/api/file-search/file-search-stores#uploadtofilesearchstore, [GitHub Issue #1211](https://github.com/googleapis/js-genai/issues/1211)
### Error 7: Forgetting Force Delete
**Symptom:**
```
Error: Cannot delete store with documents. Set force=true.
```
**Cause:** Stores with documents require `force: true` to delete (prevents accidental deletion).
**Prevention:**
Always use `force: true` when deleting non-empty stores:
```typescript
// ❌ WRONG: Trying to delete store with documents
await ai.fileSearchStores.delete({
name: fileStore.name
})
// Error: Cannot delete store with documents
// ✅ CORRECT: Use force delete
await ai.fileSearchStores.delete({
name: fileStore.name,
force: true // Deletes store AND all documents
})
// Alternative: Delete documents first
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
for (const doc of docs.documents || []) {
await ai.fileSearchStores.documents.delete({
name: doc.name,
force: true
})
}
await ai.fileSearchStores.delete({ name: fileStore.name })
```
**Source:** https://ai.google.dev/api/file-search/file-search-stores#delete
### Error 8: Using Unsupported Models
**Symptom:**
```
Error: File Search is only supported for Gemini 3 Pro and Flash models
```
**Cause:** File Search requires Gemini 3 Pro or Gemini 3 Flash. Gemini 2.x and 1.5 models are not supported.
**Prevention:**
Always use Gemini 3 models:
```typescript
// ❌ WRONG: Using Gemini 1.5 model
const response = await ai.models.generateContent({
model: 'gemini-1.5-pro', // Not supported!
contents: 'What is the installation procedure?',
config: {
tools: [{
fileSearch: { fileSearchStoreNames: [fileStore.name] }
}]
}
})
// ✅ CORRECT: Use Gemini 3 models
const response = await ai.models.generateContent({
model: 'gemini-3-flash', // ✅ Supported (fast, cost-effective)
// OR
// model: 'gemini-3-pro', // ✅ Supported (higher quality)
contents: 'What is the installation procedure?',
config: {
tools: [{
fileSearch: { fileSearchStoreNames: [fileStore.name] }
}]
}
})
```
**Source:** https://ai.google.dev/gemini-api/docs/file-search
### Error 9: displayName Not Preserved for Blob Sources (Fixed v1.34.0+)
**Symptom:**
```
groundingChunks[0].title === null // No document source shown
```
**Cause:** In @google/genai versions prior to v1.34.0, when uploading files as `Blob` objects (not file paths), the SDK dropped the `displayName` and `customMetadata` configuration fields.
**Prevention:**
```typescript
// ✅ CORRECT: Upgrade to v1.34.0+ for automatic fix
npm install @google/genai@latest // v1.34.0+
await ai.fileSearchStores.uploadToFileSearchStore({
name: storeName,
file: new Blob([arrayBuffer], { type: 'application/pdf' }),
config: {
displayName: 'Safety Manual.pdf', // ✅ Now preserved
customMetadata: { version: '1.0' } // ✅ Now preserved
}
})
// ⚠️ WORKAROUND for v1.33.0 and earlier: Use resumable upload
const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/${storeName}:uploadToFileSearchStore?key=${API_KEY}`
// Step 1: Initiate with displayName in body
const initResponse = await fetch(uploadUrl, {
method: 'POST',
headers: {
'X-Goog-Upload-Protocol': 'resumable',
'X-Goog-Upload-Command': 'start',
'X-Goog-Upload-Header-Content-Length': numBytes.toString(),
'X-Goog-Upload-Header-Content-Type': 'application/pdf',
'Content-Type': 'application/json'
},
body: JSON.stringify({
displayName: 'Safety Manual.pdf' // ✅ Works with resumable upload
})
})
// Step 2: Upload file bytes
const uploadUrl2 = initResponse.headers.get('X-Goog-Upload-URL')
await fetch(uploadUrl2, {
method: 'PUT',
headers: {
'Content-Length': numBytes.toString(),
'X-Goog-Upload-Offset': '0',
'X-Goog-Upload-Command': 'upload, finalize',
'Content-Type': 'application/pdf'
},
body: fileBytes
})
```
**Source:** [GitHub Issue #1078](https://github.com/googleapis/js-genai/issues/1078)
### Error 10: Grounding Metadata Ignored with JSON Response Mode
**Symptom:**
```
response.candidates[0].groundingMetadata === undefined
// Even though fileSearch tool is configured
```
**Cause:** When using `responseMimeType: 'application/json'` for structured output, the API ignores the `fileSearch` tool and returns no grounding metadata, even with Gemini 3 models.
**Prevention:**
```typescript
// ❌ WRONG: Structured output overrides grounding
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'Summarize guidelines',
config: {
responseMimeType: 'application/json', // Loses grounding
tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
}
})
// ✅ CORRECT: Two-step approach
// Step 1: Get grounded text response
const textResponse = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'Summarize guidelines',
config: {
tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
}
})
const grounding = textResponse.candidates[0].groundingMetadata
// Step 2: Convert to structured format in prompt
const jsonResponse = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: `Convert to JSON: ${textResponse.text}
Format:
{
"summary": "...",
"key_points": ["..."]
}`,
config: {
responseMimeType: 'application/json',
responseSchema: {
type: 'object',
properties: {
summary: { type: 'string' },
key_points: { type: 'array', items: { type: 'string' } }
}
}
}
})
// Combine results
const result = {
data: JSON.parse(jsonResponse.text),
sources: grounding.groundingChunks
}
```
**Source:** [GitHub Issue #829](https://github.com/googleapis/js-genai/issues/829)
### Error 11: Google Search and File Search Tools Are Mutually Exclusive
**Symptom:**
```
Error: "Search as a tool and file search tool are not supported together"
Status: INVALID_ARGUMENT
```
**Cause:** The Gemini API does not allow using `googleSearch` and `fileSearch` tools in the same request.
**Prevention:**
```typescript
// ❌ WRONG: Combining search tools
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'What are the latest industry guidelines?',
config: {
tools: [
{ googleSearch: {} },
{ fileSearch: { fileSearchStoreNames: [storeName] } }
]
}
})
// ✅ CORRECT: Use separate specialist agents
async function searchWeb(query: string) {
return ai.models.generateContent({
model: 'gemini-3-flash',
contents: query,
config: { tools: [{ googleSearch: {} }] }
})
}
async function searchDocuments(query: string) {
return ai.models.generateContent({
model: 'gemini-3-flash',
contents: query,
config: { tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }] }
})
}
// Orchestrate based on query type
const needsWeb = query.includes('latest') || query.includes('current')
const response = needsWeb
? await searchWeb(query)
: await searchDocuments(query)
```
**Source:** [GitHub Issue #435](https://github.com/googleapis/js-genai/issues/435), [Google Codelabs](https://codelabs.developers.google.com/gemini-file-search-for-rag)
### Error 12: Batch API Missing Response Metadata (Community-sourced)
**Symptom:**
Cannot correlate batch responses with requests when using metadata field.
**Cause:** When using Batch API with `InlinedRequest` that includes a `metadata` field, the corresponding `InlinedResponse` does not return the metadata.
**Prevention:**
```typescript
// ❌ WRONG: Expecting metadata in response
const batchRequest = {
metadata: { key: 'my-request-id' },
contents: [{ parts: [{ text: 'Question?' }], role: 'user' }],
config: {
tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
}
}
const batchResponse = await ai.batch.create({ requests: [batchRequest] })
console.log(batchResponse.responses[0].metadata) // ❌ undefined
// ✅ CORRECT: Use array index to correlate
const requests = [
{ metadata: { id: 'req-1' }, contents: [...] },
{ metadata: { id: 'req-2' }, contents: [...] }
]
const responses = await ai.batch.create({ requests })
// Map by index (not ideal but works)
responses.responses.forEach((response, i) => {
const requestMetadata = requests[i].metadata
console.log(`Response for ${requestMetadata.id}:`, response)
})
```
**Community Verification:** Maintainer confirmed, internal bug filed.
**Source:** [GitHub Issue #1191](https://github.com/googleapis/js-genai/issues/1191)
## Setup Instructions
### Step 1: Initialize Client
```typescript
import { GoogleGenAI } from '@google/genai'
import fs from 'fs'
// Initialize client with API key
const ai = new GoogleGenAI({
apiKey: process.env.GOOGLE_API_KEY
})
// Verify API key is set
if (!process.env.GOOGLE_API_KEY) {
throw new Error('GOOGLE_API_KEY environment variable is required')
}
```
### Step 2: Create File Search Store
```typescript
// Create a store (container for documents)
const fileStore = await ai.fileSearchStores.create({
config: {
displayName: 'my-knowledge-base', // Human-readable name
// Optional: Add store-level metadata
customMetadata: {
project: 'customer-support',
environment: 'production'
}
}
})
console.log('Created store:', fileStore.name)
// Output: fileSearchStores/abc123xyz...
```
**Finding Existing Stores:**
```typescript
// List all stores (paginated)
const stores = await ai.fileSearchStores.list({
pageSize: 20 // Max 20 per page
})
// Find by display name
let targetStore = null
let pageToken = null
do {
const page = await ai.fileSearchStores.list({ pageToken })
targetStore = page.fileSearchStores.find(
s => s.displayName === 'my-knowledge-base'
)
pageToken = page.nextPageToken
} while (!targetStore && pageToken)
if (targetStore) {
console.log('Found existing store:', targetStore.name)
} else {
console.log('Store not found, creating new one...')
}
```
### Step 3: Upload Documents
**Single File Upload:**
```typescript
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream('./docs/manual.pdf'),
config: {
displayName: 'Installation Manual',
customMetadata: {
doc_type: 'manual',
version: '1.0',
language: 'en'
},
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 500,
maxOverlapTokens: 50
}
}
}
})
// Poll until indexing complete
while (!operation.done) {
await new Promise(resolve => setTimeout(resolve, 1000))
operation = await ai.operations.get({ name: operation.name })
}
console.log('✅ Indexed:', operation.response.displayName)
```
**Batch Upload (Concurrent):**
```typescript
const filePaths = [
'./docs/manual.pdf',
'./docs/faq.md',
'./docs/troubleshooting.docx'
]
// Upload all files concurrently
const uploadPromises = filePaths.map(filePath =>
ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream(filePath),
config: {
displayName: filePath.split('/').pop(),
customMetadata: {
doc_type: 'support',
source_path: filePath
},
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 500,
maxOverlapTokens: 50
}
}
}
})
)
const operations = await Promise.all(uploadPromises)
// Poll all operations
for (const operation of operations) {
let op = operation
while (!op.done) {
await new Promise(resolve => setTimeout(resolve, 1000))
op = await ai.operations.get({ name: op.name })
}
console.log('✅ Indexed:', op.response.displayName)
}
```
### Step 4: Query with File Search
**Basic Query:**
```typescript
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'What are the safety precautions for installation?',
config: {
tools: [{
fileSearch: {
fileSearchStoreNames: [fileStore.name]
}
}]
}
})
console.log('Answer:', response.text)
// Access citations
const grounding = response.candidates[0].groundingMetadata
if (grounding?.groundingChunks) {
console.log('\nSources:')
grounding.groundingChunks.forEach((chunk, i) => {
console.log(`${i + 1}. ${chunk.retrievedContext?.title || 'Unknown'}`)
console.log(` URI: ${chunk.retrievedContext?.uri || 'N/A'}`)
})
}
```
**Query with Metadata Filtering:**
```typescript
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'How do I reset the device?',
config: {
tools: [{
fileSearch: {
fileSearchStoreNames: [fileStore.name],
// Filter to only search troubleshooting docs in English, version 1.0
metadataFilter: 'doc_type="troubleshooting" AND language="en" AND version="1.0"'
}
}]
}
})
console.log('Answer:', response.text)
```
**Metadata Filter Syntax:**
- AND: `key1="value1" AND key2="value2"`
- OR: `key1="value1" OR key1="value2"`
- Parentheses: `(key1="a" OR key1="b") AND key2="c"`
### Step 5: List and Manage Documents
```typescript
// List all documents in store
const docs = await ai.fileSearchStores.documents.list({
parent: fileStore.name,
pageSize: 20
})
console.log(`Total documents: ${docs.documents?.length || 0}`)
docs.documents?.forEach(doc => {
console.log(`- ${doc.displayName} (${doc.name})`)
console.log(` Metadata:`, doc.customMetadata)
})
// Get specific document details
const docDetails = await ai.fileSearchStores.documents.get({
name: docs.documents[0].name
})
console.log('Document details:', docDetails)
// Delete document
await ai.fileSearchStores.documents.delete({
name: docs.documents[0].name,
force: true
})
```
### Step 6: Cleanup
```typescript
// Delete entire store (force deletes all documents)
await ai.fileSearchStores.delete({
name: fileStore.name,
force: true
})
console.log('✅ Store deleted')
```
## Recommended Chunking Strategies
Chunking configuration significantly impacts retrieval quality. Adjust based on content type:
### Technical Documentation
```typescript
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 500, // Smaller chunks for precise code/API lookup
maxOverlapTokens: 50 // 10% overlap
}
}
```
**Best for:** API docs, SDK references, code examples, configuration guides
### Prose and Articles
```typescript
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 800, // Larger chunks preserve narrative flow
maxOverlapTokens: 80 // 10% overlap
}
}
```
**Best for:** Blog posts, news articles, product descriptions, marketing materials
### Legal and Contracts
```typescript
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 300, // Very small chunks for high precision
maxOverlapTokens: 30 // 10% overlap
}
}
```
**Best for:** Legal documents, contracts, regulations, compliance docs
### FAQ and Support
```typescript
chunkingConfig: {
whiteSpaceConfig: {
maxTokensPerChunk: 400, // Medium chunks (1-2 Q&A pairs)
maxOverlapTokens: 40 // 10% overlap
}
}
```
**Best for:** FAQs, troubleshooting guides, how-to articles
**General Rule:** Maintain 10% overlap (overlap = chunk size / 10) to prevent context loss at chunk boundaries.
## Metadata Best Practices
Design metadata schema for filtering and organization:
### Example: Customer Support Knowledge Base
```typescript
customMetadata: {
doc_type: 'faq' | 'manual' | 'troubleshooting' | 'guide',
product: 'widget-pro' | 'widget-lite',
version: '1.0' | '2.0',
language: 'en' | 'es' | 'fr',
category: 'installation' | 'configuration' | 'maintenance',
priority: 'critical' | 'normal' | 'low',
last_updated: '2025-01-15',
author: 'support-team'
}
```
**Query Example:**
```typescript
metadataFilter: 'product="widget-pro" AND (doc_type="troubleshooting" OR doc_type="faq") AND language="en"'
```
### Example: Legal Document Repository
```typescript
customMetadata: {
doc_type: 'contract' | 'regulation' | 'case-law' | 'policy',
jurisdiction: 'US' | 'EU' | 'UK',
practice_area: 'employment' | 'corporate' | 'ip' | 'tax',
effective_date: '2025-01-01',
status: 'active' | 'archived',
confidentiality: 'public' | 'internal' | 'privileged'
}
```
### Example: Code Documentation
```typescript
customMetadata: {
doc_type: 'api-reference' | 'tutorial' | 'example' | 'changelog',
language: 'javascript' | 'python' | 'java' | 'go',
framework: 'react' | 'nextjs' | 'express' | 'fastapi',
version: '1.2.0',
difficulty: 'beginner' | 'intermediate' | 'advanced'
}
```
**Tips:**
- Use consistent key naming (`snake_case` or `camelCase`)
- Limit to most important filterable fields (20 max)
- Use enums/constants for values (easier filtering)
- Include version and date fields for time-based filtering
## Cost Optimization
### 1. Deduplicate Before Upload
```typescript
// Track uploaded file hashes to avoid duplicates
const uploadedHashes = new Set<string>()
async function uploadWithDeduplication(filePath: string) {
const fileHash = await getFileHash(filePath)
if (uploadedHashes.has(fileHash)) {
console.log(`Skipping duplicate: ${filePath}`)
return
}
await ai.fileSearchStores.uploadToFileSearchStore({
name: fileStore.name,
file: fs.createReadStream(filePath)
})
uploadedHashes.add(fileHash)
}
```
### 2. Compress Large Files
```typescript
// Convert images to text before indexing (OCR)
// Compress PDFs (remove images, use text-only)
// Use markdown instead of Word docs (smaller size)
```
### 3. Use Metadata Filtering to Reduce Query Scope
```typescript
// ❌ EXPENSIVE: Search all 10GB of documents
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'Reset procedure?',
config: {
tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
}
})
// ✅ CHEAPER: Filter to only troubleshooting docs (subset)
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'Reset procedure?',
config: {
tools: [{
fileSearch: {
fileSearchStoreNames: [fileStore.name],
metadataFilter: 'doc_type="troubleshooting"' // Reduces search scope
}
}]
}
})
```
### 4. Choose Flash Over Pro for Cost Savings
```typescript
// Gemini 3 Flash is 10x cheaper than Pro for queries
// Use Flash unless you need Pro's advanced reasoning
// Development/testing: Use Flash
model: 'gemini-3-flash'
// Production (high-stakes answers): Use Pro
model: 'gemini-3-pro'
```
### 5. Monitor Storage Usage
```typescript
// List stores and estimate storage
const stores = await ai.fileSearchStores.list()
for (const store of stores.fileSearchStores || []) {
const docs = await ai.fileSearchStores.documents.list({
parent: store.name
})
console.log(`Store: ${store.displayName}`)
console.log(`Documents: ${docs.documents?.length || 0}`)
// Estimate storage (3x input size)
console.log(`Estimated storage: ~${(docs.documents?.length || 0) * 10} MB`)
}
```
## Testing & Verification
### Verify Store Creation
```typescript
const store = await ai.fileSearchStores.get({
name: fileStore.name
})
console.assert(store.displayName === 'my-knowledge-base', 'Store name mismatch')
console.log('✅ Store created successfully')
```
### Verify Document Indexing
```typescript
const docs = await ai.fileSearchStores.documents.list({
parent: fileStore.name
})
console.assert(docs.documents?.length > 0, 'No documents indexed')
console.log(`✅ ${docs.documents?.length} documents indexed`)
```
### Verify Query Functionality
```typescript
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'What is this knowledge base about?',
config: {
tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
}
})
console.assert(response.text.length > 0, 'Empty response')
console.log('✅ Query successful:', response.text.substring(0, 100) + '...')
```
### Verify Citations
```typescript
const response = await ai.models.generateContent({
model: 'gemini-3-flash',
contents: 'Provide a specific answer with citations.',
config: {
tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
}
})
const grounding = response.candidates[0].groundingMetadata
console.assert(
grounding?.groundingChunks?.length > 0,
'No grounding/citations returned'
)
console.log(`✅ ${grounding?.groundingChunks?.length} citations returned`)
```
## Integration Examples
### Streaming Support
File Search supports streaming responses with `generateContentStream()`:
```typescript
// ✅ Streaming works with File Search (v1.34.0+)
const stream = await ai.models.generateContentStream({
model: 'gemini-3-flash',
contents: 'Summarize the document',
config: {
tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
}
})
for await (const chunk of stream) {
process.stdout.write(chunk.text)
}
// Access grounding after stream completes
const grounding = stream.candidates[0].groundingMetadata
```
**Note:** Early SDK versions (pre-v1.34.0) may have had streaming issues. Use v1.34.0+ for reliable streaming support.
**Source:** [GitHub Issue #1221](https://github.com/googleapis/js-genai/issues/1221)
### Working Templates
This skill includes 3 working templates in the `templates/` directory:
### Template 1: basic-node-rag
Minimal Node.js/TypeScript example demonstrating:
- Create file search store
- Upload multiple documents
- Query with natural language
- Display citations
**Use when:** Learning File Search, prototyping, simple CLI tools
**Run:**
```bash
cd templates/basic-node-rag
npm install
npm run dev
```
### Template 2: cloudflare-worker-rag
Cloudflare Workers integration showing:
- Edge API for document upload
- Edge API for semantic search
- Integration with R2 for document storage
- Hybrid architecture (Gemini File Search + Cloudflare edge)
**Use when:** Building global edge applications, integrating with Cloudflare stack
**Deploy:**
```bash
cd templates/cloudflare-worker-rag
npm install
npx wrangler deploy
```
### Template 3: nextjs-docs-search
Full-stack Next.js application featuring:
- Document upload UI with drag-and-drop
- Real-time search interface
- Citation rendering with source links
- Metadata filtering UI
**Use when:** Building production documentation sites, knowledge bases
**Run:**
```bash
cd templates/nextjs-docs-search
npm install
npm run dev
```
## References
**Official Documentation:**
- File Search Overview: https://ai.google.dev/gemini-api/docs/file-search
- API Reference (Stores): https://ai.google.dev/api/file-search/file-search-stores
- API Reference (Documents): https://ai.google.dev/api/file-search/documents
- Blog Announcement: https://blog.google/technology/developers/file-search-gemini-api/
- Pricing: https://ai.google.dev/pricing
**Tutorials:**
- JavaScript/TypeScript Guide: https://www.philschmid.de/gemini-file-search-javascript
- SDK Repository: https://github.com/googleapis/js-genai
**Bundled Resources in This Skill:**
- `references/api-reference.md` - Complete API documentation
- `references/chunking-best-practices.md` - Detailed chunking strategies
- `references/pricing-calculator.md` - Cost estimation guide
- `references/migration-from-openai.md` - Migration guide from OpenAI Files API
- `scripts/create-store.ts` - CLI tool to create stores
- `scripts/upload-batch.ts` - Batch upload script
- `scripts/query-store.ts` - Interactive query tool
- `scripts/cleanup.ts` - Cleanup script
**Working Templates:**
- `templates/basic-node-rag/` - Minimal Node.js example
- `templates/cloudflare-worker-rag/` - Edge deployment example
- `templates/nextjs-docs-search/` - Full-stack Next.js app
---
**Skill Version:** 1.1.0
**Last Verified:** 2026-01-21
**Package Version:** @google/genai ^1.38.0 (minimum 1.29.0 required)
**Token Savings:** ~67%
**Errors Prevented:** 12
**Changes:** Added 4 new errors from community research (displayName Blob issue, grounding with JSON mode, tool conflicts, batch API metadata), enhanced polling timeout pattern with fallback verification, added streaming support note
This skill integrates Google Gemini File Search to build managed document Q&A and semantic search. It automates chunking, embeddings, indexing, and citation-ready retrieval for 100+ file types so you can query content with natural language. It includes setup guidance, cost and quota considerations, chunking recommendations, and common error prevention.
The skill configures the @google/genai File Search API, uploads documents to a file search store, and monitors asynchronous indexing operations. It automatically chunks files, creates embeddings, and stores grounding metadata so Gemini models (Gemini 3 Pro / Flash) can return grounded answers with citations. It also supplies polling patterns, delete/re-upload flows, and SDK version workarounds.
Which Gemini models support File Search?
File Search requires Gemini 3 Pro or Gemini 3 Flash. Older models are not supported.
How do I update a document after indexing?
Documents are immutable. Delete the existing document (force=true if needed) and re-upload the updated version.