home / skills / williamzujkowski / standards / serverless

serverless skill

/skills/cloud-native/serverless

This skill helps you design and implement serverless architectures in Python, delivering auto-scaling, pay-per-use benefits and rapid time-to-market.

npx playbooks add skill williamzujkowski/standards --skill serverless

Review the files below or copy the command above to add this skill to your agents.

Files (11)
SKILL.md
13.9 KB
---
name: serverless
category: cloud-native
difficulty: intermediate
tags:
- aws-lambda
- serverless
- cloud-functions
- faas
- event-driven
related:
- kubernetes
- microservices
- monitoring
version: 1.0.0
description: 'Benefits:'
---


# Serverless Computing

## Level 1: Quick Reference

### Serverless Benefits and Tradeoffs

**Benefits:**

- **Zero Server Management**: No OS patching, scaling, or capacity planning
- **Automatic Scaling**: Scales from zero to thousands of concurrent executions
- **Pay-Per-Use**: Only charged for actual execution time (100ms granularity)
- **Built-in HA**: Multi-AZ deployment by default
- **Fast Time-to-Market**: Focus on code, not infrastructure

**Tradeoffs:**

- **Cold Starts**: 100ms-5s latency for new container initialization
- **Execution Limits**: 15 min max (Lambda), 60 min (Cloud Functions)
- **Vendor Lock-in**: Platform-specific APIs and deployment models
- **Debugging Complexity**: Distributed tracing across ephemeral environments
- **State Management**: Stateless by design, requires external persistence

### Common Serverless Patterns

**1. API Backend (HTTP Trigger)**

```
Client → API Gateway → Lambda → Database
- RESTful APIs, GraphQL endpoints
- Authentication/authorization at gateway
- Response caching, throttling, API keys
```

**2. Event-Driven Processing (Event Trigger)**

```
S3 Upload → Lambda → Process File → Store Result
SQS Queue → Lambda → Transform Data → Publish Event
DynamoDB Stream → Lambda → Aggregate Metrics
```

**3. Scheduled Tasks (Cron Trigger)**

```
EventBridge Rule (cron) → Lambda → Cleanup/Report/Backup
- Data aggregation (hourly, daily)
- Automated backups and archival
- Health checks and monitoring
```

**4. Stream Processing**

```
Kinesis/Kafka → Lambda → Real-time Analytics → Dashboard
- Log processing and filtering
- Clickstream analysis
- IoT data ingestion
```

### Essential Serverless Checklist

**Architecture:**

- [ ] Function size < 50 MB (Lambda), memory 128-10240 MB
- [ ] Single responsibility per function
- [ ] Async patterns for long-running tasks (SQS, Step Functions)
- [ ] Idempotent handlers (retry-safe)

**Cold Start Optimization:**

- [ ] Provisioned concurrency for latency-sensitive APIs
- [ ] Minimize dependencies (slim packages)
- [ ] Initialize SDK clients outside handler
- [ ] Use compiled languages (Go, Rust) for sub-100ms starts

**Timeouts and Concurrency:**

- [ ] Set appropriate timeout (default 3s, max 15min Lambda)
- [ ] Configure reserved/unreserved concurrency limits
- [ ] Use SQS for rate limiting and backpressure
- [ ] Monitor throttles and errors (CloudWatch alarms)

**Security:**

- [ ] Least-privilege IAM roles (one per function)
- [ ] Secrets in AWS Secrets Manager/Parameter Store
- [ ] VPC configuration for database access (increases cold start)
- [ ] Environment variable encryption (AWS KMS)

**Observability:**

- [ ] Structured logging (JSON) with correlation IDs
- [ ] X-Ray tracing for distributed requests
- [ ] Custom CloudWatch metrics (business KPIs)
- [ ] Alerting on error rates, duration, throttles

**Cost Optimization:**

- [ ] Right-size memory (CPU scales with memory)
- [ ] Use ARM (Graviton2) for 20% cost reduction
- [ ] Delete unused functions and versions
- [ ] Monitor invocation count and duration trends

---

## Level 2:
>
> **📚 Full Examples**: See [REFERENCE.md](./REFERENCE.md) for complete code samples, detailed configurations, and production-ready implementations.

 Implementation Guide

### AWS Lambda Fundamentals

**Runtime Lifecycle:**

1. **INIT Phase** (cold start):
   - Download deployment package from S3
   - Start runtime (Python, Node.js, Java, Go, etc.)
   - Execute initialization code (outside handler)
   - Load dependencies and establish connections
   - Typical duration: 100ms-5s

2. **INVOKE Phase**:
   - Execute handler function
   - Process event payload
   - Return response or error
   - Billed duration: rounded up to nearest 1ms

3. **SHUTDOWN Phase**:
   - Runtime shutdown after inactivity (~10-15 min)
   - Connections closed, temp files deleted

**Lambda Handler Pattern:**


*See [REFERENCE.md](./REFERENCE.md#example-0) for complete implementation.*


**Lambda Layers:**

- Shared code libraries across multiple functions
- Max 5 layers per function, 250 MB total (unzipped)
- Common use cases: SDKs, custom libraries, config files


*See [REFERENCE.md](./REFERENCE.md#example-1) for complete implementation.*


**Deployment Packages:**

- **Zip Archive**: Up to 50 MB (direct upload), 250 MB (S3)
- **Container Image**: Up to 10 GB (ECR), cold start +1-2s

**Multi-Cloud Comparison:**

| Feature | AWS Lambda | Google Cloud Functions | Azure Functions |
|---------|-----------|----------------------|-----------------|
| **Max Duration** | 15 minutes | 60 minutes (2nd gen) | 10 minutes (Consumption) |
| **Memory** | 128 MB - 10 GB | 128 MB - 32 GB | 128 MB - 14 GB |
| **Cold Start** | 100-500ms (Python/Node) | 200-800ms | 150-600ms |
| **Concurrency** | 1000 default (soft limit) | 1000 per region | 200 per instance |
| **Pricing** | $0.20/1M requests + $0.0000166667/GB-s | $0.40/1M + $0.0000025/GB-s | $0.20/1M + $0.000016/GB-s |
| **Triggers** | 20+ (S3, DynamoDB, SQS, API Gateway) | 10+ (HTTP, Pub/Sub, Storage) | 15+ (HTTP, Queue, Blob) |

### Serverless Frameworks

**1. AWS SAM (Serverless Application Model):**

- CloudFormation extension for serverless resources
- Local testing with `sam local`
- Built-in CI/CD with CodePipeline integration


*See [REFERENCE.md](./REFERENCE.md#example-2) for complete implementation.*


**2. Serverless Framework:**

- Multi-cloud support (AWS, Azure, GCP)
- Plugin ecosystem (offline, webpack, domain-manager)
- Environment-based deployments


*See [REFERENCE.md](./REFERENCE.md#example-3) for complete implementation.*


**3. AWS CDK (Cloud Development Kit):**

- Infrastructure as code in Python/TypeScript/Java
- Higher-level constructs (L2/L3)
- Type safety and IDE autocomplete


*See [REFERENCE.md](./REFERENCE.md#example-4) for complete implementation.*


### Event Sources and Triggers

**1. API Gateway (HTTP/REST):**


*See [REFERENCE.md](./REFERENCE.md#example-5) for complete implementation.*


**2. S3 Events:**


*See [REFERENCE.md](./REFERENCE.md#example-6) for complete implementation.*


**3. SQS (Queue):**


*See [REFERENCE.md](./REFERENCE.md#example-7) for complete implementation.*


**4. EventBridge (Scheduled/Custom):**


*See [REFERENCE.md](./REFERENCE.md#example-8) for complete implementation.*


**5. DynamoDB Streams:**


*See [REFERENCE.md](./REFERENCE.md#example-9) for complete implementation.*


### Cold Start Optimization

**Understanding Cold Starts:**

- **Triggered by**: First invocation, scaling up, runtime updates
- **Components**: Download code → Start runtime → Init code → Invoke handler
- **Impact**: P99 latency spikes, user-facing API degradation

**Optimization Techniques:**

**1. Provisioned Concurrency:**


*See [REFERENCE.md](./REFERENCE.md#example-10) for complete implementation.*


**2. Minimize Package Size:**


*See [REFERENCE.md](./REFERENCE.md#example-11) for complete implementation.*


**3. Lazy Loading:**


*See [REFERENCE.md](./REFERENCE.md#example-12) for complete implementation.*


**4. Connection Pooling:**


*See [REFERENCE.md](./REFERENCE.md#example-13) for complete implementation.*


**5. Choose Fast Runtimes:**

```
Cold Start Benchmarks (512 MB):
- Go: 100-150ms
- Rust: 120-180ms
- Python 3.11: 200-300ms
- Node.js 20: 250-350ms
- Java 17: 1-2s (use Snapstart for sub-200ms)
- .NET 7: 500-800ms
```

**6. Lambda SnapStart (Java):**


*See [REFERENCE.md](./REFERENCE.md#example-15) for complete implementation.*


### State Management

**Ephemeral Storage (Local):**


*See [REFERENCE.md](./REFERENCE.md#example-16) for complete implementation.*


**DynamoDB (Key-Value):**


*See [REFERENCE.md](./REFERENCE.md#example-17) for complete implementation.*


**S3 (Large Objects):**


*See [REFERENCE.md](./REFERENCE.md#example-18) for complete implementation.*


**ElastiCache (Distributed Cache):**


*See [REFERENCE.md](./REFERENCE.md#example-19) for complete implementation.*


**Step Functions (Workflow State):**


*See [REFERENCE.md](./REFERENCE.md#example-20) for complete implementation.*


### Observability and Monitoring

**Structured Logging:**


*See [REFERENCE.md](./REFERENCE.md#example-21) for complete implementation.*


**X-Ray Tracing:**


*See [REFERENCE.md](./REFERENCE.md#example-22) for complete implementation.*


**Custom CloudWatch Metrics:**


*See [REFERENCE.md](./REFERENCE.md#example-23) for complete implementation.*


**CloudWatch Alarms:**


*See [REFERENCE.md](./REFERENCE.md#example-24) for complete implementation.*


### Security Best Practices

**IAM Roles and Permissions:**


*See [REFERENCE.md](./REFERENCE.md#example-25) for complete implementation.*


**Secrets Management:**


*See [REFERENCE.md](./REFERENCE.md#example-26) for complete implementation.*


**VPC Configuration:**


*See [REFERENCE.md](./REFERENCE.md#example-27) for complete implementation.*


**Input Validation:**


*See [REFERENCE.md](./REFERENCE.md#example-28) for complete implementation.*


### Testing Strategies

**Unit Tests:**


*See [REFERENCE.md](./REFERENCE.md#example-29) for complete implementation.*


**Integration Tests:**


*See [REFERENCE.md](./REFERENCE.md#example-30) for complete implementation.*


**Local Testing (SAM CLI):**


*See [REFERENCE.md](./REFERENCE.md#example-31) for complete implementation.*


### Cost Optimization

**Memory Sizing:**


*See [REFERENCE.md](./REFERENCE.md#example-32) for complete implementation.*


**ARM (Graviton2) Migration:**


*See [REFERENCE.md](./REFERENCE.md#example-33) for complete implementation.*


**Monitoring Costs:**


*See [REFERENCE.md](./REFERENCE.md#example-34) for complete implementation.*


**Cleanup and Governance:**


*See [REFERENCE.md](./REFERENCE.md#example-35) for complete implementation.*


## Examples

### Basic Usage


*See [REFERENCE.md](./REFERENCE.md#example-36) for complete implementation.*


### Advanced Usage

```python
// TODO: Add advanced example for serverless
// This example shows production-ready patterns
```

### Integration Example

```python
// TODO: Add integration example showing how serverless
// works with other systems and services
```

See `examples/serverless/` for complete working examples.

## Integration Points

This skill integrates with:

### Upstream Dependencies

- **Tools**: Common development tools and frameworks
- **Prerequisites**: Basic understanding of general concepts

### Downstream Consumers

- **Applications**: Production systems requiring serverless functionality
- **CI/CD Pipelines**: Automated testing and deployment workflows
- **Monitoring Systems**: Observability and logging platforms

### Related Skills

- See other skills in this category

### Common Integration Patterns

1. **Development Workflow**: How this skill fits into daily development
2. **Production Deployment**: Integration with production systems
3. **Monitoring & Alerting**: Observability integration points

## Common Pitfalls

### Pitfall 1: Insufficient Testing

**Problem:** Not testing edge cases and error conditions leads to production bugs

**Solution:** Implement comprehensive test coverage including:

- Happy path scenarios
- Error handling and edge cases
- Integration points with external systems

**Prevention:** Enforce minimum code coverage (80%+) in CI/CD pipeline

### Pitfall 2: Hardcoded Configuration

**Problem:** Hardcoding values makes applications inflexible and environment-dependent

**Solution:** Use environment variables and configuration management:

- Separate config from code
- Use environment-specific configuration files
- Never commit secrets to version control

**Prevention:** Use tools like dotenv, config validators, and secret scanners

### Pitfall 3: Ignoring Security Best Practices

**Problem:** Security vulnerabilities from not following established security patterns

**Solution:** Follow security guidelines:

- Input validation and sanitization
- Proper authentication and authorization
- Encrypted data transmission (TLS/SSL)
- Regular security audits and updates

**Prevention:** Use security linters, SAST tools, and regular dependency updates

**Best Practices:**

- Follow established patterns and conventions for serverless
- Keep dependencies up to date and scan for vulnerabilities
- Write comprehensive documentation and inline comments
- Use linting and formatting tools consistently
- Implement proper error handling and logging
- Regular code reviews and pair programming
- Monitor production metrics and set up alerts

---

## Level 3: Deep Dive Resources

### Official Documentation

- [AWS Lambda Developer Guide](https://docs.aws.amazon.com/lambda/)
- [Google Cloud Functions](https://cloud.google.com/functions/docs)
- [Azure Functions Documentation](https://learn.microsoft.com/azure/azure-functions/)
- [Serverless Framework](https://www.serverless.com/framework/docs)
- [AWS SAM](https://docs.aws.amazon.com/serverless-application-model/)

### Books and Courses

- **"Serverless Architectures on AWS"** by Peter Sbarski
- **"Production-Ready Serverless"** by Yan Cui
- **AWS Certified Solutions Architect** (includes serverless)
- **A Cloud Guru: AWS Lambda & Serverless**

### Tools and Libraries

- **AWS Lambda Powertools** (Python, TypeScript, Java)
- **Serverless Framework Plugins** (offline, webpack, prune)
- **Lumigo** (Observability and debugging)
- **Thundra** (APM for serverless)
- **AWS X-Ray** (Distributed tracing)

### Community Resources

- [Serverless Stack (SST)](https://sst.dev/)
- [Off-by-none Newsletter](https://offbynone.io/)
- [ServerlessLand Patterns](https://serverlessland.com/patterns)
- [AWS Samples GitHub](https://github.com/aws-samples)

### Bundled Resources

- `templates/lambda-function.py` - Production Lambda template
- `templates/sam-template.yaml` - SAM infrastructure template
- `templates/serverless.yml` - Serverless Framework config
- `templates/api-gateway.yaml` - API Gateway REST API
- `scripts/deploy-lambda.sh` - Automated deployment script
- `resources/serverless-patterns.md` - Common architecture patterns

Overview

This skill provides a concise, production-ready serverless development standard for starting projects quickly. It bundles battle-tested patterns, templates, and operational checklists focused on AWS Lambda and multi-cloud considerations. The goal is to help teams ship API backends, event-driven processors, and scheduled tasks with sensible defaults for performance, security, and cost.

How this skill works

The skill inspects common serverless concerns and supplies templates, configuration examples, and a checklist to enforce standards. It includes lifecycle guidance (init/invoke/shutdown), recommendations for cold-start optimization, observability patterns, and deployment options (SAM, Serverless Framework, CDK). It also offers integration points for CI/CD, monitoring, and downstream applications.

When to use it

  • New projects that must be production-ready from day one
  • Building HTTP APIs, event-driven workers, or cron/scheduled jobs
  • Migrating monolith tasks to functions for cost or scaling benefits
  • Prototyping fast features where time-to-market matters
  • When you need standard patterns for security, observability, and cost control

Best practices

  • Keep each function single-responsibility and idempotent
  • Minimize package size and initialize SDK clients outside the handler
  • Enforce least-privilege IAM roles and store secrets in a secrets manager
  • Right-size memory, consider ARM (Graviton2) for cost savings, and monitor duration trends
  • Use structured JSON logging, correlation IDs, and distributed tracing (X-Ray)

Example use cases

  • REST API backend via API Gateway → Lambda → database with caching and auth at the gateway
  • File processing pipeline: S3 upload → Lambda for transformation → store results in S3/DynamoDB
  • Event-driven ETL: SQS/Kinesis → Lambda → real-time analytics dashboard
  • Scheduled tasks: EventBridge cron → Lambda for cleanup, backups, or reports
  • Stream processing for clickstream or IoT ingestion with low-latency transforms

FAQ

How do I reduce cold start latency?

Use provisioned concurrency for latency-sensitive endpoints, shrink deployment packages, lazy-load heavy libraries, and pick faster runtimes (Go, Rust, Python 3.11).

What should I do about long-running work?

Avoid long synchronous functions. Break work into async patterns: SQS, Step Functions, or background jobs. Use orchestration for retries and visibility.