home / skills / itsmostafa / aws-agent-skills / bedrock

bedrock skill

/skills/bedrock

This skill helps you build AI applications with AWS Bedrock foundation models for text, embeddings, and RAG, enabling seamless model access and deployment.

npx playbooks add skill itsmostafa/aws-agent-skills --skill bedrock

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
10.6 KB
---
name: bedrock
description: AWS Bedrock foundation models for generative AI. Use when invoking foundation models, building AI applications, creating embeddings, configuring model access, or implementing RAG patterns.
last_updated: "2026-01-07"
doc_source: https://docs.aws.amazon.com/bedrock/latest/userguide/
---

# AWS Bedrock

Amazon Bedrock provides access to foundation models (FMs) from AI companies through a unified API. Build generative AI applications with text generation, embeddings, and image generation capabilities.

## Table of Contents

- [Core Concepts](#core-concepts)
- [Common Patterns](#common-patterns)
- [CLI Reference](#cli-reference)
- [Best Practices](#best-practices)
- [Troubleshooting](#troubleshooting)
- [References](#references)

## Core Concepts

### Foundation Models

Pre-trained models available through Bedrock:
- **Claude** (Anthropic): Text generation, analysis, coding
- **Titan** (Amazon): Text, embeddings, image generation
- **Llama** (Meta): Open-weight text generation
- **Mistral**: Efficient text generation
- **Stable Diffusion** (Stability AI): Image generation

### Model Access

Models must be enabled in your account before use:
- Request access in Bedrock console
- Some models require acceptance of EULAs
- Access is region-specific

### Inference Types

| Type | Use Case | Pricing |
|------|----------|---------|
| **On-Demand** | Variable workloads | Per token |
| **Provisioned Throughput** | Consistent high-volume | Hourly commitment |
| **Batch Inference** | Async large-scale | Discounted per token |

## Common Patterns

### Invoke Model (Text Generation)

**AWS CLI:**

```bash
# Invoke Claude
aws bedrock-runtime invoke-model \
  --model-id anthropic.claude-3-sonnet-20240229-v1:0 \
  --content-type application/json \
  --accept application/json \
  --body '{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Explain AWS Lambda in 3 sentences."}
    ]
  }' \
  response.json

cat response.json | jq -r '.content[0].text'
```

**boto3:**

```python
import boto3
import json

bedrock = boto3.client('bedrock-runtime')

def invoke_claude(prompt, max_tokens=1024):
    response = bedrock.invoke_model(
        modelId='anthropic.claude-3-sonnet-20240229-v1:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps({
            'anthropic_version': 'bedrock-2023-05-31',
            'max_tokens': max_tokens,
            'messages': [
                {'role': 'user', 'content': prompt}
            ]
        })
    )

    result = json.loads(response['body'].read())
    return result['content'][0]['text']

# Usage
response = invoke_claude('What is Amazon S3?')
print(response)
```

### Streaming Response

```python
import boto3
import json

bedrock = boto3.client('bedrock-runtime')

def stream_claude(prompt):
    response = bedrock.invoke_model_with_response_stream(
        modelId='anthropic.claude-3-sonnet-20240229-v1:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps({
            'anthropic_version': 'bedrock-2023-05-31',
            'max_tokens': 1024,
            'messages': [
                {'role': 'user', 'content': prompt}
            ]
        })
    )

    for event in response['body']:
        chunk = json.loads(event['chunk']['bytes'])
        if chunk['type'] == 'content_block_delta':
            yield chunk['delta'].get('text', '')

# Usage
for text in stream_claude('Write a haiku about cloud computing.'):
    print(text, end='', flush=True)
```

### Generate Embeddings

```python
import boto3
import json

bedrock = boto3.client('bedrock-runtime')

def get_embedding(text):
    response = bedrock.invoke_model(
        modelId='amazon.titan-embed-text-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps({
            'inputText': text,
            'dimensions': 1024,
            'normalize': True
        })
    )

    result = json.loads(response['body'].read())
    return result['embedding']

# Usage
embedding = get_embedding('AWS Lambda is a serverless compute service.')
print(f'Embedding dimension: {len(embedding)}')
```

### Conversation with History

```python
import boto3
import json

bedrock = boto3.client('bedrock-runtime')

class Conversation:
    def __init__(self, system_prompt=None):
        self.messages = []
        self.system = system_prompt

    def chat(self, user_message):
        self.messages.append({
            'role': 'user',
            'content': user_message
        })

        body = {
            'anthropic_version': 'bedrock-2023-05-31',
            'max_tokens': 1024,
            'messages': self.messages
        }

        if self.system:
            body['system'] = self.system

        response = bedrock.invoke_model(
            modelId='anthropic.claude-3-sonnet-20240229-v1:0',
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )

        result = json.loads(response['body'].read())
        assistant_message = result['content'][0]['text']

        self.messages.append({
            'role': 'assistant',
            'content': assistant_message
        })

        return assistant_message

# Usage
conv = Conversation(system_prompt='You are an AWS solutions architect.')
print(conv.chat('What database should I use for a chat application?'))
print(conv.chat('What about for time-series data?'))
```

### List Available Models

```bash
# List all foundation models
aws bedrock list-foundation-models \
  --query 'modelSummaries[*].[modelId,modelName,providerName]' \
  --output table

# Filter by provider
aws bedrock list-foundation-models \
  --by-provider anthropic \
  --query 'modelSummaries[*].modelId'

# Get model details
aws bedrock get-foundation-model \
  --model-identifier anthropic.claude-3-sonnet-20240229-v1:0
```

### Request Model Access

```bash
# List model access status
aws bedrock list-foundation-model-agreement-offers \
  --model-id anthropic.claude-3-sonnet-20240229-v1:0
```

## CLI Reference

### Bedrock (Control Plane)

| Command | Description |
|---------|-------------|
| `aws bedrock list-foundation-models` | List available models |
| `aws bedrock get-foundation-model` | Get model details |
| `aws bedrock list-custom-models` | List fine-tuned models |
| `aws bedrock create-model-customization-job` | Start fine-tuning |
| `aws bedrock list-provisioned-model-throughputs` | List provisioned capacity |

### Bedrock Runtime (Data Plane)

| Command | Description |
|---------|-------------|
| `aws bedrock-runtime invoke-model` | Invoke model synchronously |
| `aws bedrock-runtime invoke-model-with-response-stream` | Invoke with streaming |
| `aws bedrock-runtime converse` | Multi-turn conversation API |
| `aws bedrock-runtime converse-stream` | Streaming conversation |

### Bedrock Agent Runtime

| Command | Description |
|---------|-------------|
| `aws bedrock-agent-runtime invoke-agent` | Invoke a Bedrock agent |
| `aws bedrock-agent-runtime retrieve` | Query knowledge base |
| `aws bedrock-agent-runtime retrieve-and-generate` | RAG query |

## Best Practices

### Cost Optimization

- **Use appropriate models**: Smaller models for simple tasks
- **Set max_tokens**: Limit output length when possible
- **Cache responses**: For repeated identical queries
- **Batch when possible**: Use batch inference for bulk processing
- **Monitor usage**: Set up CloudWatch alarms for cost

### Performance

- **Use streaming**: For better user experience with long outputs
- **Connection pooling**: Reuse boto3 clients
- **Regional deployment**: Use closest region to reduce latency
- **Provisioned throughput**: For consistent high-volume workloads

### Security

- **Least privilege IAM**: Only grant needed model access
- **VPC endpoints**: Keep traffic private
- **Guardrails**: Implement content filtering
- **Audit with CloudTrail**: Track model invocations

### IAM Permissions

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
      ]
    }
  ]
}
```

## Troubleshooting

### AccessDeniedException

**Causes:**
- Model access not enabled in console
- IAM policy missing `bedrock:InvokeModel`
- Wrong model ID or region

**Debug:**

```bash
# Check model access status
aws bedrock list-foundation-models \
  --query 'modelSummaries[?modelId==`anthropic.claude-3-sonnet-20240229-v1:0`]'

# Test IAM permissions
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/my-role \
  --action-names bedrock:InvokeModel \
  --resource-arns "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
```

### ModelNotReadyException

**Cause:** Model is still being provisioned or temporarily unavailable.

**Solution:** Implement retry with exponential backoff:

```python
import time
from botocore.exceptions import ClientError

def invoke_with_retry(bedrock, body, max_retries=3):
    for attempt in range(max_retries):
        try:
            return bedrock.invoke_model(
                modelId='anthropic.claude-3-sonnet-20240229-v1:0',
                body=json.dumps(body)
            )
        except ClientError as e:
            if e.response['Error']['Code'] == 'ModelNotReadyException':
                time.sleep(2 ** attempt)
            else:
                raise
    raise Exception('Max retries exceeded')
```

### ThrottlingException

**Causes:**
- Exceeded on-demand quota
- Too many concurrent requests

**Solutions:**
- Request quota increase
- Implement exponential backoff
- Consider provisioned throughput

### ValidationException

**Common issues:**
- Invalid model ID
- Malformed request body
- max_tokens exceeds model limit

**Debug:**

```python
# Check model-specific requirements
aws bedrock get-foundation-model \
  --model-identifier anthropic.claude-3-sonnet-20240229-v1:0 \
  --query 'modelDetails.inferenceTypesSupported'
```

## References

- [Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/)
- [Bedrock API Reference](https://docs.aws.amazon.com/bedrock/latest/APIReference/)
- [Bedrock Runtime API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html)
- [Model Parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html)
- [Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)

Overview

This skill provides an interface to AWS Bedrock foundation models for building generative AI applications. It exposes patterns for text generation, streaming responses, embeddings, conversation state, model listing, and model access management. Use it to integrate Claude, Titan, Llama, Mistral, and Stable Diffusion models through a unified API and boto3/CLI examples.

How this skill works

The skill wraps Bedrock Runtime and control-plane operations to invoke models synchronously or with streaming, create embeddings, and manage multi-turn conversations with history. It demonstrates common CLI and boto3 calls, handling model IDs, content types, and request/response JSON bodies. It also covers model discovery, access requests, and error handling patterns like retries and exponential backoff.

When to use it

  • Building text-generation features (summaries, Q&A, code generation).
  • Creating embeddings for retrieval-augmented generation (RAG) pipelines.
  • Streaming long-form outputs to improve user experience.
  • Managing model access, listing available models, or requesting EULA acceptance.
  • Optimizing inference cost and throughput for production workloads.

Best practices

  • Choose the smallest model that meets quality needs to reduce cost and latency.
  • Set max_tokens and cache repeated responses to limit spend.
  • Use streaming for long responses and provisioned throughput for high-volume steady traffic.
  • Apply least-privilege IAM, enable VPC endpoints, and audit invocations with CloudTrail.
  • Implement exponential backoff and retries for ModelNotReady and throttling errors.

Example use cases

  • Invoke Claude to generate technical explanations or developer docs.
  • Produce 1024-dimension embeddings with Titan for semantic search and RAG.
  • Stream model output to a chat UI for progressive display of long answers.
  • Maintain conversation history for a multi-turn assistant with system prompts.
  • Automate model discovery and access checks before running inference jobs.

FAQ

How do I choose between on-demand and provisioned throughput?

Use on-demand for variable or unpredictable workloads and provisioned throughput when you need consistent low latency at scale and can commit to hourly capacity.

What causes AccessDeniedException and how to fix it?

AccessDenied usually means the model isn't enabled for your account, the region is wrong, or IAM lacks bedrock:InvokeModel permissions. Verify model access in the console and update IAM policies with least-privilege ARNs.