home / skills / amnadtaowsoam / cerebraskills / architectural-reviews

architectural-reviews skill

/00-meta-skills/architectural-reviews

This skill guides architectural reviews to reduce technical debt and boost system stability in large-scale Python projects.

npx playbooks add skill amnadtaowsoam/cerebraskills --skill architectural-reviews

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
20.5 KB
---
name: Architectural Reviews
description: Expert-level framework for conducting architectural reviews to reduce technical debt and ensure system stability in enterprise-scale projects.
---

# Architectural Reviews

## Overview

Architectural Reviews is a critical process for evaluating system designs before implementation, helping to reduce risks in large-scale systems where incorrect early decisions can cause millions of dollars in damage and months of repair time. This skill provides a comprehensive framework for conducting systematic reviews that assess requirements, scalability, security, maintainability, and operational considerations. It enables teams to make informed architectural decisions that support long-term system health and business objectives.

## Why This Matters

- **Reduces Technical Debt**: Effective architectural reviews prevent costly rework and debt accumulation over the system lifecycle
- **Increases System Stability**: Identifies potential design flaws before production, reducing downtime and operational issues
- **Improves Team Velocity**: Provides clear design guidance that helps development teams work more efficiently
- **Reduces Maintenance Costs**: Proactively addresses issues that would otherwise require expensive fixes later
- **Ensures Investment Confidence**: Gives executives and stakeholders confidence that technical investments are sound

---

## Core Concepts

### 1. Review Triggers

Architectural reviews should be triggered at key decision points:

- **Project Initiation**: New projects or major initiatives
- **Technology Changes**: Significant technology stack changes or framework migrations
- **Major Modifications**: Architectural changes that affect system structure
- **Periodic Health Checks**: Regular reviews of existing systems (quarterly or biannually)
- **Post-Incident**: Reviews after major incidents to prevent recurrence

### 2. Review Types

Different types of reviews serve different purposes:

- **Design Review**: Evaluates proposed architecture before implementation
- **Code Review**: Examines implementation quality and alignment with design
- **Post-Implementation Review**: Assesses outcomes after deployment
- **Periodic Health Checks**: Ongoing monitoring of architectural health
- **Security Review**: Focused evaluation of security aspects

### 3. Checklist Framework

A comprehensive checklist ensures consistent review quality:

- **Requirements**: Are functional and non-functional requirements clear?
- **Scalability**: Can the system handle expected growth?
- **Security**: Are security measures adequate and appropriate?
- **Maintainability**: Is the design maintainable and evolvable?
- **Testability**: Is the system testable?
- **Cost**: Are costs reasonable and justified?
- **Operations**: Is the design operable and monitorable?
- **Technology Choices**: Are technology selections justified?

### 4. Decision Framework

Establish clear processes for decision-making:

- Define decision criteria upfront
- Document all alternatives considered
- Record rationale for decisions
- Assign ownership for action items
- Set timelines for follow-up

## Quick Start

1. **Initiate Review**: Create a review request with architectural documents, diagrams, and requirements
2. **Prepare Materials**: Prepare C4 diagrams (Context, Container, Component), sequence diagrams, and ADRs; share materials 48 hours in advance
3. **Assemble Review Team**: Invite Architect (lead), Technical Lead, Security Engineer, DevOps Engineer, Product Owner, and developer representatives
4. **Conduct Review**: Follow agenda - 15-30 min presentation, 30-45 min Q&A, 15 min for decisions and action items
5. **Document Outcomes**: Record status (approved/rejected/deferred), decisions, concerns, and action items with owners and due dates
6. **Follow Up**: Track action items until completion; schedule follow-up review if needed
7. **Close Review**: Mark review as complete when all action items are resolved; document lessons learned

```markdown
# Architecture Review: [Project Name]

**Date:** 2024-01-15
**Reviewers:** Alice (Architect), Bob (Security), Carol (DevOps)
**Presenter:** Dave (Tech Lead)

## Summary

**Status:** ✅ Approved with Minor Changes

## Decisions

### Approved
- Use PostgreSQL for primary database
- Implement REST API with FastAPI
- Deploy on Kubernetes

### Deferred
- GraphQL API (revisit in Q2)
- Multi-region deployment (Phase 2)

### Rejected
- MongoDB (doesn't meet consistency requirements)
- Serverless architecture (operational complexity)

## Action Items

1. Add Redis redundancy (Carol, 2024-01-20)
2. Conduct database load testing (Dave, 2024-01-22)
3. Create cost optimization plan (Dave, 2024-01-25)
```

## Production Checklist

- [ ] Architecture diagrams created (C4 model - Context, Container, Component)
- [ ] Design decisions documented (ADRs)
- [ ] Review scheduled and stakeholders invited
- [ ] Presentation prepared (15-30 minutes)
- [ ] Security review completed (OWASP Top 10, threat modeling)
- [ ] Performance requirements defined
- [ ] Scalability plan documented
- [ ] Failure modes identified (single points of failure)
- [ ] Operational requirements defined (monitoring, alerting)
- [ ] Technology choices justified with alternatives
- [ ] Cost analysis completed
- [ ] Action items assigned with owners and due dates
- [ ] Follow-up review scheduled if needed

## Anti-patterns

1. **Over-Engineering**: Conducting complex reviews for small projects where simple assessments would suffice
2. **Bikeshedding**: Spending excessive time debating minor details while ignoring critical issues
3. **Missing Follow-up**: Failing to track action items to completion, undermining the review's value
4. **Lack of Documentation**: Not recording decisions and rationales clearly, leading to repeat discussions
5. **Ignoring Context**: Applying rigid standards without considering project-specific constraints and needs
6. **One-Size-Fits-All**: Using the same review depth and process for all projects regardless of complexity

## Integration Points

- **Architecture Decision Records (ADRs)**: Link reviews to specific decisions documented in ADRs
- **CI/CD Pipelines**: Integrate review requirements as gates before merging code
- **Documentation Platforms**: Confluence, Notion, GitHub Wiki for storing review reports
- **Diagram Tools**: Structurizr, C4-Model, Mermaid.js, PlantUML for architecture visualization
- **Project Management**: Jira, Azure DevOps for tracking review action items
- **Security Processes**: Threat modeling, security audits as part of reviews

## Further Reading

- [C4 Model](https://c4model.com/) - System context and component diagramming
- [Architecture Decision Records](https://adr.github.io/) - Documenting architectural decisions
- [Software Architecture in Practice](https://www.sei.cmu.edu/publications/books/software-architecture-in-practice.cfm)
- [Fundamentals of Software Architecture](https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/)
- [Architecture Review Checklist](https://github.com/joelparkerhenderson/architecture-decision-record)
- [TOGAF](https://www.opengroup.org/togaf) - The Open Group Architecture Framework
- [ISO/IEC 25010](https://iso25000.com/index.php/en/iso-25000-standards/iso-25010) - Software Quality Model

---

## Review Process and Workflow

### Process Flow

```
1. Request Review
   ↓
2. Prepare Materials
   ↓
3. Schedule Review
   ↓
4. Conduct Review
   ↓
5. Document Decisions
   ↓
6. Follow-up Actions
   ↓
7. Close Review
```

### Preparation Checklist

**For Presenter:**
```markdown
- [ ] Create architecture diagrams (C4 model)
- [ ] Document design decisions (ADRs)
- [ ] Prepare presentation (15-30 min)
- [ ] List open questions
- [ ] Share materials 48 hours before review
```

**For Reviewers:**
```markdown
- [ ] Review materials beforehand
- [ ] Prepare questions
- [ ] Research unfamiliar technologies
- [ ] Review similar past projects
```

## Participants and Roles

### Review Team

**Architect (Lead Reviewer)**
- Evaluates overall design
- Ensures alignment with standards
- Identifies architectural issues

**Technical Lead**
- Assesses implementation feasibility
- Reviews technology choices
- Estimates effort

**Security Engineer**
- Reviews security aspects
- Identifies vulnerabilities
- Ensures compliance

**DevOps Engineer**
- Assesses operational complexity
- Reviews deployment strategy
- Evaluates monitoring approach

**Product Owner**
- Validates requirements alignment
- Assesses business value
- Prioritizes concerns

**Developer Representatives**
- Provide implementation perspective
- Ask clarifying questions
- Identify potential issues

## Review Documentation

### Review Report Template

```markdown
# Architecture Review: [Project Name]

**Date:** 2024-01-15
**Reviewers:** Alice (Architect), Bob (Security), Carol (DevOps)
**Presenter:** Dave (Tech Lead)

## Summary

**Status:** ✅ Approved with Minor Changes

**Overall Assessment:**
The proposed architecture is sound and meets requirements. A few minor
concerns need to be addressed before implementation.

## Requirements Review

✅ **Functional Requirements:** Well-defined and achievable
✅ **Non-Functional Requirements:** Clearly specified
⚠️ **Constraints:** Budget constraint may be tight

## Architecture Assessment

### Strengths
- Clean separation of concerns
- Scalable design
- Good use of caching
- Comprehensive monitoring plan

### Concerns
1. **Database Choice** (Medium Priority)
   - PostgreSQL may struggle with write-heavy workload
   - Consider: Evaluate write performance under load
   - Owner: Dave
   - Due: 2024-01-22

2. **Single Point of Failure** (High Priority)
   - Redis cache has no redundancy
   - Consider: Add Redis Sentinel or Cluster
   - Owner: Carol
   - Due: 2024-01-20

3. **Cost** (Low Priority)
   - Estimated costs are at budget limit
   - Consider: Identify cost optimization opportunities
   - Owner: Dave
   - Due: 2024-01-25

## Decisions

### Approved
- Use PostgreSQL for primary database
- Implement REST API with FastAPI
- Deploy on Kubernetes

### Deferred
- GraphQL API (revisit in Q2)
- Multi-region deployment (Phase 2)

### Rejected
- MongoDB (doesn't meet consistency requirements)
- Serverless architecture (operational complexity)

## Action Items

1. Add Redis redundancy (Carol, 2024-01-20)
2. Conduct database load testing (Dave, 2024-01-22)
3. Create cost optimization plan (Dave, 2024-01-25)
4. Update architecture diagrams (Dave, 2024-01-18)
5. Write ADRs for key decisions (Dave, 2024-01-19)

## Next Steps

- Address action items
- Schedule follow-up review (if needed): 2024-01-26
- Proceed with implementation after action items complete

## Appendix

- Architecture diagrams: [link]
- ADRs: [link]
- Requirements doc: [link]
```

## Common Review Patterns

### 1. Presentation + Q&A

```
Format:
- 15-30 min presentation
- 30-45 min Q&A and discussion
- 15 min decision and action items

Best for:
- Major architectural decisions
- New projects
- Complex designs
```

### 2. Written RFC + Async Comments

```
Format:
- Author writes detailed RFC
- Reviewers comment asynchronously
- Optional sync meeting for discussion

Best for:
- Distributed teams
- Less urgent decisions
- Well-defined problems
```

### 3. Lightweight Check-ins

```
Format:
- 15-30 min quick review
- Focus on specific aspect
- Informal discussion

Best for:
- Minor changes
- Progress checks
- Specific questions
```

## Red Flags to Look For

### Over-Engineering

```
🚩 Red Flags:
- Using microservices for small app
- Complex patterns for simple problems
- Premature optimization
- Technology for technology's sake

Questions to Ask:
- Do we really need this complexity?
- What's the simplest solution?
- Can we start simpler and evolve?
```

### Under-Engineering

```
🚩 Red Flags:
- No consideration of scale
- No error handling
- No monitoring
- No security measures
- "We'll add that later"

Questions to Ask:
- What happens when this grows?
- How will we know if it breaks?
- What if someone attacks this?
```

### Missing Non-Functional Requirements

```
🚩 Red Flags:
- No performance targets
- No availability requirements
- No security considerations
- No scalability plan

Questions to Ask:
- How fast should this be?
- How much downtime is acceptable?
- How many users will we have?
```

### Single Points of Failure

```
🚩 Red Flags:
- Single database instance
- No redundancy
- No failover mechanism
- Critical dependency on external service

Questions to Ask:
- What happens if this fails?
- Do we have a backup?
- Can we survive an outage?
```

### Tight Coupling

```
🚩 Red Flags:
- Services directly calling each other
- Shared database between services
- No abstraction layers
- Hard-coded dependencies

Questions to Ask:
- Can we change one component without affecting others?
- Are responsibilities clearly separated?
- Can we test components independently?
```

### Technology Choices Without Justification

```
🚩 Red Flags:
- "Let's use X because it's cool"
- No comparison of alternatives
- Team has no experience with technology
- No consideration of operational complexity

Questions to Ask:
- Why this technology?
- What alternatives did you consider?
- Does the team have expertise?
- What's the learning curve?
```

## Feedback Delivery Best Practices

### Do ✅

**Be Specific**
```
❌ "This design is bad"
✅ "The database choice may not handle the write-heavy workload. Consider..."
```

**Focus on Issues, Not People**
```
❌ "You didn't think about security"
✅ "We should add authentication to this endpoint"
```

**Provide Alternatives**
```
❌ "This won't work"
✅ "This approach may have issues with X. Have you considered Y?"
```

**Ask Questions**
```
❌ "This is wrong"
✅ "Can you explain the reasoning behind this decision?"
```

**Prioritize Feedback**
```
✅ "Critical: Add authentication"
✅ "Nice to have: Consider adding caching"
```

### Don't ❌

**Be Vague**
```
❌ "I don't like this"
❌ "This feels wrong"
```

**Be Dismissive**
```
❌ "This will never work"
❌ "We tried this before and it failed"
```

**Bikeshed**
```
❌ Spending 30 minutes debating variable names
❌ Arguing about code formatting
```

**Demand Perfection**
```
❌ "This needs to handle every edge case"
❌ "Rewrite everything"
```

## Architecture Decision Outcome Tracking

### Decision Log

```markdown
# Architecture Decision Log

| Date | Decision | Status | Outcome | Lessons Learned |
|------|----------|--------|---------|-----------------|
| 2024-01-15 | Use PostgreSQL | Implemented | ✅ Working well | Good choice for our use case |
| 2024-02-01 | Microservices | Implemented | ⚠️ More complex than expected | Should have started with monolith |
| 2024-03-01 | GraphQL API | Rejected | N/A | REST was simpler for our needs |
```

### Retrospective Template

```markdown
# Architecture Retrospective: [Decision]

**Decision:** Use microservices architecture
**Date Made:** 2024-02-01
**Date Reviewed:** 2024-08-01 (6 months later)

## What We Expected
- Faster development (independent teams)
- Better scalability
- Technology flexibility

## What Actually Happened
- Development slower initially (learning curve)
- Operational complexity higher than expected
- Debugging more difficult

## What Went Well
- Can scale services independently
- Team autonomy improved
- Deployment flexibility

## What Didn't Go Well
- Distributed tracing was hard to set up
- More infrastructure costs
- Network latency issues

## Lessons Learned
- Start with monolith, extract services later
- Invest in observability from day one
- Underestimated operational complexity

## Would We Do It Again?
⚠️ Maybe - with better preparation and tooling

## Recommendations
- For similar projects: Start with modular monolith
- If doing microservices: Invest heavily in DevOps
```

## Tools

### C4 Diagrams

```
Level 1: System Context
┌─────────────┐
│   Users     │
└──────┬──────┘
       ↓
┌─────────────┐      ┌─────────────┐
│   System    │─────→│  External   │
│             │      │   System    │
└─────────────┘      └─────────────┘

Level 2: Container Diagram
┌──────────────────────────────────┐
│         System                   │
│  ┌────────┐    ┌────────┐       │
│  │  Web   │───→│  API   │       │
│  │  App   │    │ Server │       │
│  └────────┘    └────┬───┘       │
│                     ↓            │
│              ┌────────┐          │
│              │Database│          │
│              └────────┘          │
└──────────────────────────────────┘

Level 3: Component Diagram
Level 4: Code Diagram
```

### Sequence Diagrams

```
User → API: POST /order
API → Database: Check inventory
Database → API: Inventory available
API → Payment: Process payment
Payment → API: Payment successful
API → Queue: Publish order event
API → User: Order confirmed
Queue → Worker: Process order
Worker → Database: Update inventory
```

### Architecture Views (4+1 Model)

```
1. Logical View (Functionality)
   - What the system does
   - Class diagrams, component diagrams

2. Process View (Concurrency)
   - How the system runs
   - Sequence diagrams, activity diagrams

3. Development View (Organization)
   - How code is organized
   - Package diagrams, module structure

4. Physical View (Deployment)
   - Where components run
   - Deployment diagrams, infrastructure

+1. Scenarios (Use Cases)
   - How users interact
   - Use case diagrams, user stories
```

## Real Examples of Review Findings

### Example 1: Database Scaling Issue

**Finding:**
```
Design proposed single PostgreSQL instance for e-commerce platform
expecting 100K users.

Concern: Single instance won't handle load
```

**Discussion:**
```
Reviewer: "How many transactions per second do you expect?"
Designer: "About 1000 TPS at peak"
Reviewer: "Single Postgres can handle that, but what about growth?"
Designer: "We'll add read replicas when needed"
Reviewer: "What about write scaling?"
Designer: "We could shard by user ID if needed"
```

**Outcome:**
```
✅ Approved with recommendation:
- Start with single instance + read replicas
- Plan sharding strategy for future
- Monitor write load closely
- Document scaling triggers
```

### Example 2: Security Vulnerability

**Finding:**
```
API design had no authentication on admin endpoints.

Concern: Critical security vulnerability
```

**Discussion:**
```
Reviewer: "I don't see authentication on /admin endpoints"
Designer: "Oh, we'll add that later"
Reviewer: "This is a critical security issue"
Designer: "You're right, we should add it now"
```

**Outcome:**
```
❌ Rejected - must fix before approval
- Add JWT authentication
- Implement role-based access control
- Add rate limiting
- Security audit before deployment
```

### Example 3: Over-Engineering

**Finding:**
```
Design proposed microservices architecture for simple CRUD app
with 3 developers and 1000 users.

Concern: Unnecessary complexity
```

**Discussion:**
```
Reviewer: "Why microservices for this?"
Designer: "For scalability and team autonomy"
Reviewer: "You have 3 developers and 1000 users"
Designer: "But we might grow"
Reviewer: "Start simple, refactor when needed"
```

**Outcome:**
```
✅ Approved with changes:
- Start with modular monolith
- Design for future extraction
- Revisit architecture at 10K users
- Document service boundaries now
```

## Best Practices

1. **Review Early** - Before implementation starts
2. **Be Prepared** - Share materials in advance
3. **Stay Focused** - Stick to architecture, not implementation details
4. **Be Constructive** - Suggest alternatives, don't just criticize
5. **Document Decisions** - Write ADRs for key decisions
6. **Follow Up** - Track action items
7. **Learn** - Conduct retrospectives
8. **Be Respectful** - Focus on design, not designer
9. **Time-box** - Don't let reviews drag on
10. **Iterate** - Reviews are conversations, not one-time events

## Resources

- [C4 Model](https://c4model.com/)
- [Architecture Decision Records](https://adr.github.io/)
- [Software Architecture in Practice](https://www.sei.cmu.edu/publications/books/software-architecture-in-practice.cfm)
- [Fundamentals of Software Architecture](https://www.oreilly.com/library/view/fundamentals-of-software/9781492043447/)
- [Architecture Review Checklist](https://github.com/joelparkerhenderson/architecture-decision-record)

Overview

This skill provides an expert-level framework for conducting architectural reviews that reduce technical debt and improve system stability in enterprise-scale projects. It codifies triggers, review types, checklists, decision rules, and participant roles so teams make informed, repeatable architecture decisions. The goal is to catch design risks early, document rationale, and ensure follow-through on remediation.

How this skill works

The framework inspects proposed designs against a checklist covering requirements, scalability, security, maintainability, testability, cost, and operations. It prescribes review types (design, security, post-implementation, periodic health checks), preparation steps (C4 diagrams, ADRs, sequence flows) and a meeting agenda for presentation, Q&A, and decision-making. Outcomes are recorded as approved/deferred/rejected with action items, owners, and due dates and tracked until closure.

When to use it

  • At project initiation or before major implementation work
  • When changing core technologies or migrating frameworks
  • After incidents to prevent recurrence
  • For periodic health checks of long-lived systems
  • Before major scope changes or multi-team integrations

Best practices

  • Share diagrams and ADRs at least 48 hours before the review
  • Assemble a cross-functional team: architect, security, DevOps, product, and developers
  • Use clear decision criteria, record alternatives and rationale, and assign owners for action items
  • Prioritize feedback (critical vs. nice-to-have) and avoid bikeshedding
  • Track and close action items; schedule follow-up reviews when needed

Example use cases

  • Reviewing a new microservices design to validate scalability and failure modes
  • Approving a migration from monolith to containerized deployments with cost and operational analysis
  • Post-incident architecture review to harden system resilience and eliminate single points of failure
  • Security-focused review for a public API implementation and threat-model validation
  • Quarterly health check of a core platform to identify accumulating technical debt

FAQ

Who should convene an architecture review?

Typically the architect or technical lead requests the review and invites stakeholders including security, DevOps, product, and representative developers.

What artifacts are required to start a review?

At minimum provide requirements, C4-style diagrams, sequence diagrams, and ADRs. Share materials 48 hours in advance for effective review.

How do I prevent follow-up items from being ignored?

Record action items with clear owners and due dates, track them in your project tool (Jira/Azure DevOps), and require closure or a follow-up review before implementation proceeds.