home / skills / acedergren / oci-agent-skills / best-practices
This skill helps you design OCI architectures using best practices, anti-pattern avoidance, and multi-AD patterns for resilient, scalable deployments.
npx playbooks add skill acedergren/oci-agent-skills --skill best-practicesReview the files below or copy the command above to add this skill to your agents.
---
name: best-practices
description: Use when architecting OCI solutions, migrating from AWS/Azure, designing multi-AD deployments, or avoiding common OCI anti-patterns. Covers VCN sizing mistakes, Cloud Guard gotchas, free tier specifics, OCI terminology confusion, and multi-AD patterns.
license: MIT
metadata:
author: alexander-cedergren
version: "2.0.0"
---
# OCI Best Practices - Expert Knowledge
## ποΈ Use OCI Landing Zone Terraform Modules
**Don't reinvent the wheel.** Use [oracle-terraform-modules/landing-zone](https://github.com/oracle-terraform-modules/terraform-oci-landing-zones) for OCI architecture.
**Landing Zone solves:**
- β Bad Practice #1: Generic compartments (Landing Zone provides hierarchical Network/Security/Workloads structure)
- β Bad Practice #2: Administrator for daily ops (Landing Zone enforces least-privilege IAM policies)
- β Bad Practice #4: Poor network segmentation (Landing Zone implements hub-spoke topology with security zones)
- β Bad Practice #8: Creating your own Terraform modules (Landing Zone provides battle-tested, Oracle-maintained, CIS-certified modules)
**This skill provides**: OCI-specific anti-patterns, architecture patterns, and operational knowledge for resources deployed WITHIN a Landing Zone.
---
## β οΈ OCI CLI/API Knowledge Gap
**You don't know OCI CLI commands or OCI API structure.**
Your training data has limited and outdated knowledge of:
- OCI CLI syntax and parameters (updates monthly)
- OCI API endpoints and request/response formats
- OCI service-specific commands and flags
- Latest OCI features, limits, and regional availability
- CIS Benchmark requirements for OCI
**When OCI operations are needed:**
1. Use exact CLI commands from skill references
2. Do NOT guess OCI CLI syntax or parameters
3. Do NOT assume API endpoint structures
4. Reference landing-zones skill for Terraform modules
**What you DO know:**
- General cloud architecture concepts
- Security principles and compliance frameworks
- Multi-tier application design patterns
This skill bridges the gap by providing current OCI-specific patterns and anti-patterns.
---
You are an OCI architecture expert. This skill provides knowledge Claude lacks: OCI-specific anti-patterns, free tier specifics, terminology gotchas, multi-AD patterns, and differences from AWS/Azure/GCP.
## NEVER Do This
β **NEVER use /24 or smaller VCN CIDR (cannot expand)**
```
# WRONG - VCN too small, cannot expand later (OCI limitation)
oci network vcn create --cidr-block "10.0.0.0/24"
# Only 256 IPs total, exhausted quickly
# WRONG - copying AWS habit (/16 VPC default)
# OCI supports larger: /16 to /30
# RIGHT - start with /16, plan for growth
oci network vcn create --cidr-block "10.0.0.0/16"
# 65,536 IPs, room for 256 /24 subnets
# CRITICAL: OCI VCNs CANNOT be resized after creation
# Must create new VCN and migrate if too small
```
**Migration cost**: Recreating VCN = hours of downtime, IP changes, security rule updates
β **NEVER use AD-specific subnets (breaks multi-AD HA)**
```
# WRONG - subnet tied to single AD
oci network subnet create \
--vcn-id <vcn-ocid> \
--cidr-block "10.0.1.0/24" \
--availability-domain "fMgC:US-ASHBURN-AD-1" # AD-specific!
# Problem: Can't launch instances in other ADs, no HA
# RIGHT - regional subnet (works in all ADs)
oci network subnet create \
--vcn-id <vcn-ocid> \
--cidr-block "10.0.1.0/24"
# No --availability-domain flag = regional
# Instances can be in any AD in region
```
**Gotcha**: Some old OCI guides show AD-specific subnets (deprecated pattern)
β **NEVER confuse Security Lists vs NSGs (different use cases)**
```
OCI has TWO network security mechanisms:
Security Lists (stateful, subnet-level):
- Applied to ALL resources in subnet
- Use for: Broad rules (internet egress, DNS)
- Limit: 5 per subnet
- Changes: Affect all instances in subnet
Network Security Groups (NSG, resource-level):
- Applied to specific resources
- Use for: Granular rules (app tier β DB tier)
- Limit: 5 per resource, 120 rules per NSG
- Changes: Affect only tagged resources
# WRONG - using Security Lists for app-specific rules
Security List: Allow app-tier β database (applies to ENTIRE subnet)
# RIGHT - use NSG for app-tier resources
NSG "app-tier": Allow egress to NSG "db-tier" on port 1521
# Only instances in app-tier NSG can reach DB
```
**Best practice**: Security Lists for baseline (internet, DNS), NSGs for application-specific rules
β **NEVER assume single-AD deployment is acceptable (no SLA)**
```
OCI Availability Domains (ADs):
- 3 ADs per region (most regions)
- Isolated fault domains
- <1ms latency between ADs
# WRONG - all resources in single AD
All instances in AD-1 β AD failure = complete outage
# RIGHT - distribute across ADs
Production instances: AD-1, AD-2, AD-3
Load balancer: Automatically multi-AD
Database: Autonomous (auto 3-AD) or RAC (2+ nodes)
SLA impact:
Single-AD: NO SLA (OCI doesn't guarantee)
Multi-AD: 99.95% SLA
```
**Critical**: Oracle **refuses** SLA claims for single-AD deployments in regions with 3 ADs
β **NEVER hardcode AD names (tenant-specific)**
```
# WRONG - AD names are tenant-specific, not portable
availability_domain = "fMgC:US-ASHBURN-AD-1" # Only works in YOUR tenancy!
# Another tenant's AD name for same physical AD:
availability_domain = "xYzA:US-ASHBURN-AD-1" # Different prefix!
# RIGHT - query AD names dynamically
data "oci_identity_availability_domains" "ads" {
compartment_id = var.tenancy_ocid
}
resource "oci_core_instance" "web" {
availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}
```
**Why**: OCI generates unique AD prefixes per tenant for security isolation
β **NEVER enable Cloud Guard auto-remediation without testing**
```
Cloud Guard = OCI threat detection + auto-response
# DANGER - auto-remediation can break production
Detector: "Public bucket detected"
Auto-remediation: Make bucket private β breaks public website!
Detector: "Security list allows 0.0.0.0/0"
Auto-remediation: Remove rule β breaks internet access!
# SAFER approach:
1. Enable detectors (read-only mode first)
2. Review findings for 1-2 weeks
3. Tune responders to avoid false positives
4. Enable auto-remediation for trusted patterns only
```
**Gotcha**: Cloud Guard enabled by default in some tenancies, can auto-break things
β **NEVER assume you need Oracle Linux (common misconception)**
```
OCI supports:
β Oracle Linux (free, optimized)
β Ubuntu, CentOS, Rocky Linux (free)
β Windows Server (BYOL or license-included)
β Custom images (import your own)
# WRONG assumption: "OCI = must use Oracle Linux"
Reality: Any Linux works, Ubuntu has larger community
# Cost: Oracle Linux is FREE (no license cost)
# But if team knows Ubuntu β use Ubuntu
```
**Marketing confusion**: Oracle pushes Oracle Linux, but it's not required
## OCI Always-Free Tier (Exact Limits)
**Generous permanent free tier** (no credit card trial, no expiration):
### Compute
- **2 AMD VMs**: VM.Standard.E2.1.Micro (1/8 OCPU, 1 GB RAM each)
- **4 Arm OCPUs**: VM.Standard.A1.Flex (allocate as 1Γ4 OCPU or 4Γ1 OCPU)
- Up to 24 GB total RAM (6 GB per OCPU)
- **Example**: Run 4Γ 1OCPU/6GB Arm instances free forever
### Database
- **2 Autonomous Databases**: 1 OCPU each, 20 GB storage per ADB
- Can be ATP or ADW
- **Limit**: 2 total per tenancy across all regions
### Storage
- **Block volumes**: 200 GB total (2Γ 100 GB boot volumes + custom)
- **Object storage**: 10 GB Standard tier
- **Archive storage**: 10 GB Archive tier
- **Block volume backups**: 10 GB
### Networking
- **Load balancer**: 1 flexible LB, 10 Mbps bandwidth
- **VCN**: 2 VCNs per region (free, no OCID cost)
- **Public IPv4**: 1 reserved public IP free per region
### Observability
- **Monitoring**: 1 billion data points ingested
- **Logging**: 10 GB ingested per month
- **Notifications**: 1 million emails per month
### Always-Free Gotchas
**CRITICAL limits often missed**:
```
# Gotcha 1: 2 ADB limit is TENANCY-wide, not per region
Can have: 1 ATP in Phoenix + 1 ADW in Ashburn = 2 (limit reached)
Cannot: Add 3rd ADB in any region
# Gotcha 2: Arm instances must be VM.Standard.A1.Flex only
Cannot: Use newer A2 shapes (paid only)
# Gotcha 3: Free tier != trial credits
Free tier: Permanent, no expiration
Trial: $300 credit for 30 days (separate)
# Gotcha 4: Stopped ADB counts toward 2 ADB limit
To free slot: Must DELETE ADB, not just STOP
```
## OCI Terminology vs AWS/Azure
Migrating from AWS/Azure? **Terminology traps**:
| OCI Term | AWS Equivalent | Azure Equivalent |
|----------|---------------|------------------|
| **VCN** | VPC | Virtual Network |
| **Subnet** | Subnet | Subnet |
| **Security List** | VPC Security Group | NSG (network-level) |
| **NSG** | Security Group | Application Security Group |
| **DRG** | Virtual Private Gateway | VPN Gateway |
| **Compartment** | Resource Group / OU | Resource Group |
| **Tenancy** | Account | Subscription |
| **Region** | Region | Region |
| **AD (Availability Domain)** | Availability Zone | Availability Zone |
| **Fault Domain** | (within AZ) | Availability Set |
| **Dynamic Group** | IAM Role (for instances) | Managed Identity |
| **Instance Principal** | EC2 Instance Profile | Managed Identity |
| **OCIR** | ECR | Container Registry |
| **OKE** | EKS | AKS |
**Critical difference**: OCI has **both** Security Lists (subnet) **and** NSGs (resource). AWS only has Security Groups (resource-level).
## Multi-AD Architecture Patterns
**OCI multi-AD specifics**:
### AD Distribution Strategy
```
OCI Regions with 3 ADs (most regions):
- US: Phoenix, Ashburn
- UK: London
- DE: Frankfurt
- AU: Sydney, Melbourne
Pattern: Distribute instances across all 3 ADs
AD-1: Web tier (2 instances) + DB primary
AD-2: Web tier (2 instances) + DB standby
AD-3: Web tier (2 instances) + DB standby
Load Balancer: Automatically spans ADs
```
**Gotcha**: Some shapes only available in specific ADs (check first)
```bash
# Check shape availability by AD
oci compute shape list \
--compartment-id <ocid> \
--availability-domain "fMgC:US-ASHBURN-AD-1"
```
### Fault Domain Additional Layer
Within each AD, OCI has **Fault Domains** (FD):
- 3 FDs per AD
- Separate power, cooling, network
- <1ms latency within AD
```
Best practice: Spread instances across ADs AND FDs
AD-1:
FD-1: Web instance 1
FD-2: Web instance 2
FD-3: Web instance 3
AD-2:
FD-1: Web instance 4
(repeat pattern)
Protection:
- AD failure: 2 ADs survive (66% capacity)
- FD failure: Only 1 instance affected (16% capacity)
```
**When to use FDs**: Only for extra-critical apps (adds complexity)
## Compartment Strategy Best Practices
**Compartment hierarchy** (OCI-specific IAM boundary):
```
Root Compartment (tenancy)
β
ββ SharedServices (networking, security)
β ββ Network (VCNs, DRGs)
β ββ Security (Vault, KMS, Cloud Guard)
β
ββ Production
β ββ App1
β β ββ Compute
β β ββ Database
β β ββ Storage
β ββ App2
β
ββ NonProduction
β ββ Development
β ββ Testing
β ββ Staging
β
ββ Sandbox (developers, auto-cleanup)
```
**Key principles**:
1. **Billing separation**: Compartment tags enable cost reporting by environment
2. **IAM boundaries**: Policies scoped to compartments (least privilege)
3. **Quota isolation**: Separate limits per compartment
4. **Lifecycle**: Delete entire compartment = deletes all resources inside
**Anti-pattern**: Flat structure with no hierarchy (AWS account-per-env habit)
## Cost Optimization OCI-Specific
### Flex Shape Savings (Unique to OCI)
```
Fixed shapes (legacy):
VM.Standard2.4: 4 OCPUs, 60 GB RAM, $218/month
Flex shapes (right-size RAM independently):
VM.Standard.E4.Flex: 4 OCPUs, 16 GB RAM, $109/month (50% savings)
Flex advantage: Pay only for RAM you need
- 1 OCPU = 1-64 GB RAM configurable
- Most apps don't need 15GB per OCPU (fixed ratio)
```
**Migration**: Replace fixed shapes with Flex for 30-50% savings
### Arm Instance Savings (Generous Free Tier)
```
AMD instance: VM.Standard.E4.Flex (1 OCPU) = $0.03/hr
Arm instance: VM.Standard.A1.Flex (1 OCPU) = $0.01/hr (67% cheaper)
Always-Free Arm: 4 OCPUs free forever!
Use case: Web servers, CI/CD runners, dev environments
Limitation: ARM64 only (not all apps compatible)
```
**Gotcha**: Free tier is A1 shapes only, newer A2 shapes are paid
### Storage Tiering (Exact Prices)
| Tier | Cost/GB/Month | Use Case | Retrieval |
|------|--------------|----------|-----------|
| **Standard** | $0.0255 | Active data, frequent access | Instant, free |
| **Infrequent Access** | $0.0125 (51% off) | Backups, logs (accessed monthly) | Instant, $0.01/GB |
| **Archive** | $0.0024 (91% off) | Compliance, long-term retention | 1 hour, $0.01/GB |
**Lifecycle policy example**:
```
Day 0-30: Standard ($0.0255/GB/mo)
Day 31-90: Infrequent ($0.0125/GB/mo)
Day 91+: Archive ($0.0024/GB/mo)
1 TB data for 1 year:
Without tiering: $0.0255 Γ 1000 Γ 12 = $306/year
With tiering: $0.0255 Γ 1000 Γ 1 + $0.0125 Γ 1000 Γ 2 + $0.0024 Γ 1000 Γ 9 = $72/year
Savings: $234/year (76%)
```
## Security Zones (OCI-Unique)
**OCI Security Zones** = Infrastructure-level policy enforcement:
```
Security Zone enforces:
β All storage encrypted
β No public buckets
β No internet gateways in VCN
β All databases private endpoint only
β Cloud Guard enabled
Enforcement: API rejects violating requests (preventive, not detective)
Example:
oci os bucket create --public-access-type ObjectRead
β FAILS if compartment is in Security Zone
Use case: Production, PCI-DSS, healthcare (mandatory controls)
```
**Gotcha**: Security Zones can break existing automation (test in dev first)
## Progressive Loading References
### OCI Well-Architected Checklist
**WHEN TO LOAD** [`oci-well-architected-checklist.md`](references/oci-well-architected-checklist.md):
- Running compliance checks against OCI tenancy
- Preparing for CIS OCI Foundations Benchmark audit
- Implementing automated security scanning
- Creating remediation scripts for common findings
- Setting up monitoring for drift detection
**Do NOT load** for:
- Quick anti-pattern reference (NEVER list above covers it)
- Architecture decisions (covered in this skill)
- Understanding OCI terminology (tables above)
### Official Oracle Documentation Sources
**Primary References** (50+ official sources scraped):
- [Well-Architected Framework for OCI](https://docs.oracle.com/en/solutions/oci-best-practices/)
- [OCI Architecture Center](https://docs.oracle.com/en/solutions/)
- [CIS OCI Foundations Benchmark](https://www.cisecurity.org/benchmark/oracle_cloud)
- [Security Best Practices](https://docs.oracle.com/en-us/iaas/Content/Security/Reference/security_best_practices.htm)
- [Compartment Design](https://docs.oracle.com/en-us/iaas/Content/Identity/Tasks/managingcompartments.htm)
- [Network Architecture Patterns](https://docs.oracle.com/en/solutions/oci-best-practices/)
**Note**: All anti-patterns, terminology mappings, and Always-Free limits in this skill are derived from official Oracle documentation and A-Team Oracle blog
---
## When to Use This Skill
- Architecture design: Multi-AD patterns, compartment strategy, VCN sizing
- Migration from AWS/Azure: Terminology mapping, anti-pattern avoidance
- Cost optimization: Free tier planning, Flex shapes, storage tiering
- Security: Cloud Guard tuning, Security Zones, NSG vs Security Lists
- Production readiness: SLA requirements, HA patterns, fault tolerance
This skill provides concise, actionable OCI best practices for architects and operators migrating from AWS/Azure, designing multi-AD deployments, or avoiding common OCI anti-patterns. It focuses on networking, tenancy structure, free-tier limits, Cloud Guard behavior, and multi-AD/fault-domain patterns to reduce downtime and operational risk. Use it to validate architecture choices and prevent irreversible mistakes like undersized VCNs or AD-bound subnets.
The skill codifies OCI-specific patterns and anti-patterns and highlights gotchas that general cloud knowledge often misses. It explains what to avoid (VCN sizing, AD-specific subnets, unsafe auto-remediation) and what to apply (regional subnets, NSGs vs security lists, Landing Zone modules, compartment strategies). It also summarizes Always-Free limits and cost-saving shape choices so you can design within constraints.
Can I resize a VCN later if I choose a small CIDR?
No. OCI VCNs cannot be resized; if the CIDR is too small you must create a new VCN and migrate resources, causing downtime and IP changes.
Does Cloud Guard auto-remediation always safe to enable?
No. Auto-remediation can break production (for example making a public bucket private). Run detectors in read-only, review findings, and only enable responders for trusted patterns.