home / skills / acedergren / oci-agent-skills / finops-cost-optimization

finops-cost-optimization skill

/skills/finops-cost-optimization

This skill helps you optimize OCI costs by identifying hidden traps, calculating exact savings, and maximizing Universal Credits and free tier usage.

npx playbooks add skill acedergren/oci-agent-skills --skill finops-cost-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
15.3 KB
---
name: finops-cost-optimization
description: Use when optimizing OCI costs, investigating unexpected bills, planning budgets, or identifying waste. Covers hidden cost traps (boot volumes, reserved IPs, egress), Universal Credits gotchas, shape migration savings, free tier maximization, and cost allocation challenges.
license: MIT
metadata:
  author: alexander-cedergren
  version: "2.0.0"
---

# OCI FinOps - Expert Knowledge

## πŸ—οΈ Use OCI Landing Zone Terraform Modules

**Don't reinvent the wheel.** Use [oracle-terraform-modules/landing-zone](https://github.com/oracle-terraform-modules/terraform-oci-landing-zones) for cost-optimized infrastructure.

**Landing Zone solves:**
- ❌ Bad Practice #3: Internet breakout from spoke networks (Egress cost waste $3k-5k/month; Landing Zone uses hub-spoke with centralized NAT/Firewall)
- ❌ Bad Practice #8: Creating your own Terraform modules (Maintenance cost; Landing Zone is Oracle-maintained, CIS-certified, no technical debt)
- ❌ Bad Practice #10: No monitoring (Blind spending; Landing Zone auto-configures budget alerts, usage notifications, and cost tracking)

**This skill provides**: Cost optimization strategies, hidden cost traps, and savings calculations for resources deployed WITHIN a Landing Zone.

---

## ⚠️ OCI CLI/API Knowledge Gap

**You don't know OCI CLI commands or OCI API structure.**

Your training data has limited and outdated knowledge of:
- OCI CLI syntax and parameters (updates monthly)
- OCI API endpoints and request/response formats
- Cost Management CLI operations (`oci usage-api`)
- Current OCI pricing (changes frequently)
- Universal Credits consumption rates and SKUs

**When OCI operations are needed:**
1. Use exact CLI commands from skill references
2. Do NOT guess OCI CLI syntax or parameters
3. Do NOT assume current pricing - always verify
4. Reference official Oracle pricing calculator

**What you DO know:**
- General cloud cost optimization principles
- FinOps frameworks and methodologies
- Resource right-sizing concepts

This skill bridges the gap by providing current OCI-specific cost traps and optimization patterns.

---

You are an OCI cost optimization expert. This skill provides knowledge Claude lacks: hidden cost traps, Universal Credits gotchas, exact savings calculations, free tier maximization, and OCI-specific billing nuances.

## NEVER Do This

❌ **NEVER ignore orphaned boot volumes (silent cost drain)**
```
# Default OCI behavior: Boot volumes PRESERVED after instance termination
oci compute instance terminate --instance-id <ocid> --force
# Instance deleted, but boot volume remains (charges continue!)

Cost trap:
- Boot volume: 50 GB Γ— $0.025/GB/mo = $1.25/month per instance
- 20 terminated test instances = $25/month wasted
- Over 1 year: $300 wasted on deleted instances

# RIGHT - explicitly delete boot volume
oci compute instance terminate \
  --instance-id <ocid> \
  --preserve-boot-volume false  # Critical flag!

# Or in Terraform:
resource "oci_core_instance" "dev" {
  preserve_boot_volume = false  # Must set explicitly
}
```

**Cleanup strategy**: Monthly audit for unattached boot volumes older than 7 days

❌ **NEVER forget reserved public IPs cost money when unattached**
```
Cost: $0.01/hour = $7.30/month per IP

# Common mistake: Reserve IP, detach from instance, forget to release
oci network public-ip create --lifetime RESERVED
# Later: delete instance but IP remains reserved (charges continue)

Wasted cost example:
- 5 old reserved IPs from deleted instances
- 5 Γ— $7.30/month = $36.50/month waste
- Over 1 year: $438 wasted

# RIGHT - use ephemeral IPs for temporary resources
oci network public-ip create --lifetime EPHEMERAL
# Auto-deleted when instance terminated

# Or release reserved IPs explicitly
oci network public-ip delete --public-ip-id <ocid>
```

**Detection**: List reserved IPs without instance attachment

❌ **NEVER assume stopped resources = zero cost**
```
Stopped Autonomous Database:
βœ“ Compute: Zero cost (stopped)
βœ— Storage: $0.025/GB/mo continues
βœ— Backups: Retention charges continue

Example: 1 TB ADB stopped for 30 days
Storage: 1000 GB Γ— $0.025 = $25/month (charged!)

Stopped Compute Instance:
βœ“ Compute: Zero cost
βœ— Boot volume: $0.025/GB/mo continues
βœ— Block volumes: $0.025/GB/mo continues
βœ— Reserved IP (if attached): $7.30/month continues

# Better for long-term idle (>30 days): TERMINATE + backup
# Restore from backup when needed
```

**Rule**: Stopped = compute paused, storage still charged

❌ **NEVER ignore data egress costs (surprise bills)**
```
OCI egress pricing:
- First 10 TB/month: FREE
- 10-50 TB: $0.0085/GB
- 50+ TB: Contact sales for discount

# Common mistake: Large data export
oci os object bulk-download --bucket-name backups --download-dir /local
# Downloading 15 TB = 5 TB chargeable Γ— $0.0085/GB = $42,500 surprise!

# Cheaper alternatives:
1. Use OCI FastConnect (fixed cost, unlimited egress)
2. Download within same region (free between OCI services)
3. Export to another OCI region first (inter-region = free)

Cost comparison (15 TB export):
- Internet egress: $42,500
- FastConnect (1 Gbps): $1,100/month flat rate
- Breakeven: 130 GB/month egress
```

**Gotcha**: Egress is FREE between OCI regions (intra-Oracle network)

❌ **NEVER enable NAT Gateway without understanding costs**
```
NAT Gateway pricing:
- Gateway itself: $0.01/hour = $7.30/month
- Data processed: $0.01/GB

# Mistake: Use NAT Gateway for high-traffic apps
Example: Application with 5 TB/month outbound traffic
- Gateway: $7.30/month
- Data: 5000 GB Γ— $0.01 = $50/month
- Total: $57.30/month

# Cheaper alternative: Public IP on instance (ephemeral IP = free)
Cost: $0/month for egress <10 TB

When NAT Gateway makes sense:
- Private subnets with <100 GB/month egress
- Security requirement (no public IPs on instances)

When to avoid:
- High-traffic egress (use public IPs instead)
- Low-security dev/test environments
```

❌ **NEVER over-commit with Universal Credits**
```
Universal Credits gotcha:
- Credits are NON-TRANSFERABLE between service categories
  * Compute credits β†’ can only pay for compute
  * Database credits β†’ can only pay for database
  * Cannot move credits between categories

# Mistake: Commit to $10k/month compute credits
# Reality: Only use $6k/month compute
# Cannot use $4k surplus for database spending
# Result: Waste $4k/month ($48k/year)

# RIGHT - commit conservatively based on baseline usage
Analyze 3-6 months historical usage
Commit to 70-80% of baseline (not peak)

Expiration gotcha:
- Monthly credits expire end of month
- Cannot roll over to next month
- Use-it-or-lose-it model
```

❌ **NEVER trust forecast budgets (30-40% error rate)**
```
OCI Budget types:
1. ACTUAL: Alerts on actual spending (accurate)
2. FORECAST: Alerts on projected spending (often wrong)

# Problem: Forecast uses simple linear projection
Example:
- Week 1: $100 spend
- Forecast for month: $100 Γ— 4 = $400
- Reality: Week 1 included one-time data migration
- Actual month: $150 total
- Forecast error: 167% over-prediction

# WRONG - rely on forecast alerts
Alert at 80% forecast β†’ fires prematurely

# RIGHT - use actual spend alerts at multiple thresholds
Alert at 50% actual, 75% actual, 90% actual, 100% actual
```

**Best practice**: Use FORECAST for trends, ACTUAL for budgeting

## Cost Optimization Strategies

### Shape Migration Savings (Exact Calculations)

**Fixed β†’ Flex Shape Migration**

```
Legacy fixed shape:
VM.Standard2.4: 4 OCPUs, 60 GB RAM (fixed ratio 1:15)
Cost: $0.06/hr Γ— 4 = $0.24/hr = $175/month

Flex shape (right-sized):
VM.Standard.E4.Flex: 4 OCPUs, 16 GB RAM (custom ratio)
Cost: ($0.02/OCPU-hr Γ— 4) + ($0.0015/GB-hr Γ— 16) = $0.104/hr = $76/month

Savings: $99/month per instance (56% reduction)

Migration path:
1. Create flex instance with same OCPU count
2. Test with minimal RAM (4 GB per OCPU)
3. Increase RAM only if needed
4. Delete fixed instance
```

**AMD β†’ Arm Migration**

```
AMD instance:
VM.Standard.E4.Flex: 4 OCPUs, 16 GB RAM
Cost: $0.02/OCPU-hr Γ— 4 Γ— 730 = $58.40/month

Arm instance (same workload):
VM.Standard.A1.Flex: 4 OCPUs, 16 GB RAM
Cost: $0.01/OCPU-hr Γ— 4 Γ— 730 = $29.20/month

Savings: $29.20/month per instance (50% reduction)

Gotcha: ARM64 architecture (not all apps compatible)
Check: Docker images available for ARM64?
```

### Free Tier Maximization

**Exact cost avoidance** (what you DON'T pay):

```
Always-Free tier value if fully utilized:

Compute:
- 2 AMD Micro VMs: 2 Γ— $7/month = $14/month
- 4 Arm OCPUs (24 GB): 4 Γ— $7.30/month = $29.20/month
Subtotal: $43.20/month

Database:
- 2 Autonomous Databases: 2 Γ— $292/month = $584/month

Storage:
- 200 GB block: $5/month
- 10 GB object: $0.26/month
- 10 GB archive: $0.02/month
Subtotal: $5.28/month

Networking:
- 1 load balancer: $10/month
- 10 TB egress: $85/month
Subtotal: $95/month

TOTAL COST AVOIDED: $727.48/month ($8,730/year)
```

**Gotcha**: 2 ADB limit is TENANCY-wide (not per region/compartment)

### Storage Lifecycle Optimization

**Automated tiering savings**:

```
Scenario: 10 TB compliance data, accessed quarterly

Without tiering (all Standard):
10,000 GB Γ— $0.0255/GB/mo = $255/month = $3,060/year

With tiering (Archive after 30 days):
Month 1 (Standard): 10,000 GB Γ— $0.0255 = $255
Months 2-12 (Archive): 10,000 GB Γ— $0.0024 Γ— 11 = $264
Total year: $519

Savings: $2,541/year (83% reduction)

Lifecycle policy:
- Day 0-30: Standard (frequent access window)
- Day 31+: Archive (compliance retention)
- Retrieval cost: $0.01/GB (acceptable for quarterly access)
```

### Compute Auto-Shutdown Savings

**Dev/test environment automation**:

```
Scenario: 10 development instances, 2 OCPUs each
Running 24/7: 10 Γ— 2 Γ— $0.02/hr Γ— 730 = $292/month

Auto-shutdown (weekdays 9am-6pm only):
Usage: 9 hours/day Γ— 5 days = 45 hours/week = 195 hours/month
Cost: 10 Γ— 2 Γ— $0.02/hr Γ— 195 = $78/month

Savings: $214/month (73% reduction)

Implementation:
1. Tag instances: Environment=Development
2. Create Functions for start/stop
3. Schedule with OCI Events:
   - Start: Weekdays 9am
   - Stop: Weekdays 6pm
```

## Universal Credits Gotchas

**Credit categories** (non-transferable):

| Category | Services Included | Typical Use |
|----------|------------------|-------------|
| **Compute** | VM, bare metal, container instances, OKE | Applications, workloads |
| **Database** | ADB, DB Systems, Exadata, MySQL | Data storage, transactions |
| **Storage** | Block, object, file, archive | Backups, data lakes |
| **Network** | Load balancer, FastConnect, NAT GW | Connectivity, egress |

**Commitment trap**:

```
Scenario: Commit to $5k/month database credits
Month 1-3: Use only $3k/month (waste $2k/month)
Cannot: Transfer $2k to compute spending
Result: Waste $6k over 3 months

Better approach:
1. Analyze 6 months baseline usage per category
2. Commit to 70% of baseline (room for variance)
3. Over-commit only if growth guaranteed
```

**Expiration**: Monthly credits = use-it-or-lose-it (no rollover)

## Cost Allocation Challenges

**Shared resource allocation** (OCI-specific):

```
Problem: How to allocate costs for shared VCN?

VCN costs:
- DRG: $0.01/hr = $7.30/month
- NAT Gateway: $0.01/hr + $0.01/GB = variable
- FastConnect: $1,100/month (1 Gbps)

Allocation strategies:
1. Equal split: $1,107.30 / N teams
2. Usage-based: Track egress per team (requires tagging)
3. Chargeback: Production pays 100%, dev/test free

Gotcha: OCI doesn't auto-allocate shared resource costs
Must implement manual showback/chargeback process
```

**Load balancer cost allocation**:

```
Problem: Multiple apps share 1 load balancer

LB costs:
- Flexible LB: $10/month + $0.008/LCU-hour
- Bandwidth: $0.01/GB processed

Allocation approach:
1. Tag backends by team (freeform_tags.Team)
2. Estimate traffic % per team (monitoring metrics)
3. Allocate LB cost proportionally

Example:
- Total LB cost: $50/month
- Team A: 60% traffic β†’ $30/month
- Team B: 40% traffic β†’ $20/month
```

## Budget Best Practices

**Multi-threshold alerts**:

```
# Best practice: Set 4 alert levels
oci budgets alert-rule create \
  --budget-id <ocid> \
  --type ACTUAL \
  --threshold 50 \
  --recipients "[email protected]"

oci budgets alert-rule create \
  --budget-id <ocid> \
  --type ACTUAL \
  --threshold 80 \
  --recipients "[email protected],[email protected]"

oci budgets alert-rule create \
  --budget-id <ocid> \
  --type ACTUAL \
  --threshold 90 \
  --recipients "[email protected],[email protected],[email protected]"

oci budgets alert-rule create \
  --budget-id <ocid> \
  --type ACTUAL \
  --threshold 100 \
  --recipients "[email protected],[email protected],[email protected]"

Alert strategy:
50%: FYI to team (informational)
80%: Action required (investigate spike)
90%: Escalation to management
100%: Finance involved (budget exceeded)
```

**Gotcha**: Budgets are ALERTING only, not enforcement (cannot block spending)

## Hidden Cost Detection

**Monthly audit checklist**:

```bash
# 1. Orphaned boot volumes (cost trap #1)
oci bv boot-volume list --all --lifecycle-state AVAILABLE \
  | grep -v "attached-instance"

# 2. Unattached block volumes
oci bv volume list --all --lifecycle-state AVAILABLE

# 3. Reserved IPs without instance
oci network public-ip list --scope REGION --lifetime RESERVED

# 4. Stopped instances with volumes
oci compute instance list --lifecycle-state STOPPED

# 5. Old snapshots/backups
oci bv backup list --all \
  | jq '.data[] | select(.["time-created"] < "2024-01-01")'

# 6. Unused load balancers (no backends)
oci lb load-balancer list --all

# 7. Empty Object Storage buckets
oci os bucket list --all --fields approximateCount,approximateSize
```

**Estimated monthly savings**: $100-500 for typical tenancy (cleanup waste)

## Progressive Loading References

### OCI Cost Management CLI

**WHEN TO LOAD** [`oci-cost-cli.md`](references/oci-cost-cli.md):
- Setting up budgets and alert rules
- Querying usage reports via CLI
- Managing service limits and quotas
- Implementing tagging for cost allocation
- Downloading detailed cost reports

**Do NOT load** for:
- Quick cost calculations (tables and examples above)
- Understanding cost traps (NEVER list above)
- Shape pricing comparisons (covered above)

### Official Oracle Documentation Sources

**Primary References** (200+ official sources scraped):
- [Billing and Cost Management Overview](https://docs.oracle.com/en-us/iaas/Content/Billing/Concepts/billingoverview.htm)
- [Cost Reports Overview](https://docs.oracle.com/en-us/iaas/Content/Billing/Concepts/costusagereportsoverview.htm)
- [Budgets Overview](https://docs.oracle.com/en-us/iaas/Content/Billing/Concepts/budgetsoverview.htm)
- [Cost Management Best Practices](https://docs.oracle.com/en-us/iaas/Content/cloud-adoption-framework/era-cost-management.htm)
- [OCI FinOps Hub](https://docs.oracle.com/en-us/iaas/Content/Billing/Concepts/FinOps.htm)
- [Cloud Advisor (Cost Optimization)](https://docs.oracle.com/en-us/iaas/Content/CloudAdvisor/home.htm)

**Note**: All cost calculations, traps, and gotchas in this skill are derived from these official sources and A-Team Oracle blog

---

## When to Use This Skill

- Unexpected bills: Investigating charges, finding cost spikes
- Budget planning: Estimating costs, setting budgets, forecasting
- Cost optimization: Right-sizing, waste reduction, shape migration
- Free tier: Maximizing always-free usage, avoiding over-provisioning
- Universal Credits: Commitment planning, credit allocation
- Cost allocation: Showback, chargeback, shared resource costs
- Savings calculations: Comparing options, ROI analysis

Overview

This skill helps optimize Oracle Cloud Infrastructure (OCI) costs by identifying hidden billing traps, calculating exact savings from shape migrations, and guiding Universal Credits and budget planning. It focuses on practical actions: cleaning orphaned resources, right-sizing instances, egress strategies, free tier maximization, and reliable cost allocation. Use it to reduce surprise bills and improve ongoing FinOps discipline.

How this skill works

The skill inspects common OCI cost leak sources and prescribes audits, configurations, and migration paths. It provides precise examples and arithmetic for savings (e.g., flex vs fixed shapes, ARM vs AMD, storage tiering), plus concrete detection commands and remediation steps. It also explains Universal Credits categories, commit strategy, and multi-threshold budgeting best practices.

When to use it

  • Investigating unexpected OCI bills or monthly anomalies
  • Designing cost-aware landing zones and Terraform deployments
  • Planning Universal Credits commitments and category sizing
  • Right-sizing compute (shape or architecture migration) and storage lifecycle policies
  • Setting up budget alerts, showback/chargeback, or monthly cost audits

Best practices

  • Use Oracle-maintained landing zone modules to avoid egress and networking anti-patterns
  • Audit monthly for orphaned boot volumes, unattached block volumes, reserved IPs, and stale backups
  • Set preserve_boot_volume=false or explicitly delete boot volumes when terminating instances
  • Commit Universal Credits conservatively (70–80% of baseline) and analyze 3–6 months of usage
  • Use ACTUAL budget alerts at multiple thresholds (50, 75, 90, 100%) and treat FORECAST as trend guidance only

Example use cases

  • Clean up orphaned boot volumes and reserved public IPs to stop silent monthly charges
  • Migrate fixed shapes to flex shapes or AMDβ†’ARM for 40–60% cost reduction per instance
  • Implement storage lifecycle policies to archive cold compliance data and save >80% on storage costs
  • Evaluate egress strategy: choose FastConnect vs internet egress based on breakeven and monthly TB
  • Automate dev/test auto-shutdown and tag-based schedules to cut 60–75% of dev spend

FAQ

How do Universal Credits categories affect commit decisions?

Credits are non-transferable across categories. Analyze historical usage per category and commit conservatively (70–80% of baseline) to avoid wasted credits.

Are stopped instances free?

No. Compute charges pause, but storage (boot and block volumes), backups, and reserved IPs can continue accruing charges; consider termination plus backup for long idle periods.